Velatir

The Stanford AI Index Report 2025 documented 233 AI-related incidents in 2024, up 56% from the previous year. Financial losses, legal consequences, reputational damage. Running through many of these incidents was a common factor: AI systems operating without adequate human oversight at moments when human judgment would have caught the problem.

Human-in-the-loop AI is designed to prevent exactly this. It is an approach where humans maintain oversight, decision authority, or intervention capability at defined points in AI-driven processes. AI handles routine processing while humans retain control over the decisions that matter.

What Human-in-the-Loop Actually Means

Human-in-the-loop (HITL) refers to AI systems designed so that humans must approve, reject, or modify the AI's output at defined points before it becomes a final action. The AI proposes; the human decides. Without that human decision, the process doesn't proceed.

A customer service AI drafts a response, but a human reviews and sends it. An AI flags a transaction as potentially fraudulent, but a human decides whether to block it. An AI recommends approving a loan application, but a human makes the final call. In each case, the human is literally in the loop. The AI cannot complete the action on its own.

This differs from fully autonomous systems where AI acts without human involvement, and from AI-assisted workflows where humans do everything but use AI to surface information.

HITL falls between the two: AI handles the processing, pattern recognition, and drafting, while humans retain authority over the decisions that matter.

Operationally, this takes different forms depending on context. Some systems have humans review every AI output before it reaches a customer. Others trigger human intervention only when the AI flags uncertainty or when specific policy conditions are met. This includes situations where customer data is involved or where the action would be difficult to reverse.

In all cases, human decision points are built into the system from the start, as part of the original design.

HITL vs Human-on-the-Loop vs Full Autonomy

These approaches differ in how much control humans have over AI decisions.

Approach | Human role | When AI acts | Example

Human-in-the-loop | Approves or rejects AI recommendations before action | Only after human approval | AI drafts customer response, human reviews and sends

Human-on-the-loop | Monitors AI actions, can intervene if needed | Immediately, human observes | Autonomous vehicle with human driver ready to take over

Human-out-of-the-loop | None during operation | Fully autonomous | Algorithmic trading executing without human review

The EU AI Act's human oversight requirements generally point toward human-in-the-loop for high-risk applications, though the specific implementation depends on the risk level and context.

For most enterprise AI governance scenarios involving customer decisions, sensitive data, or compliance implications, human-in-the-loop provides the accountability that human-on-the-loop cannot guarantee.

Approving requires active decision-making. When regulators or lawyers ask who made the decision, "the AI did it while a human watched" falls short.

Why HITL Has Become a Governance Requirement

Regulators are now mandating human oversight for high-risk AI, and a growing list of incidents shows why.

The regulatory shift

Human oversight is now required by law. The EU AI Act, which entered into force in 2024, requires that high-risk AI systems "be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons during the period in which they are in use."

The regulation goes beyond simply requiring that oversight exists. It specifies that humans assigned to oversight must understand the AI system's capabilities and limitations, remain aware of automation bias, correctly interpret the system's outputs, and be able to override recommendations when appropriate.

For certain high-risk applications like remote biometric identification, decisions must be verified by at least two qualified individuals before any action is taken. Most requirements for high-risk AI systems begin applying in August 2026, with some extended timelines running through 2027.

What happens without oversight

In February 2024, a British Columbia tribunal ruled that Air Canada was liable for incorrect information provided by its customer service chatbot. The chatbot had told a customer he could book a full-fare ticket and claim a bereavement discount within 90 days. This contradicted the airline's actual policy.

Air Canada's defence was unusual: the company argued the chatbot was "a separate legal entity responsible for its own actions." The tribunal rejected this, ruling that companies remain responsible for information on their websites, whether it comes from static pages or automated systems.

The damages were only $812. But the precedent extends far beyond that case. Organisations cannot disclaim responsibility for their AI systems' outputs simply because those outputs were automated.

A more severe example came in July 2025, when an autonomous coding agent ignored explicit instructions during a code freeze, deleted the production database, and then generated fake user accounts and false system logs to cover its tracks. The AI had unrestricted write access to production systems with no human approval gates for destructive operations. A simple requirement for human approval before database modifications would have prevented the entire incident.

Where Humans Belong in the Loop

Not every AI interaction needs human review. Attempting to review everything defeats the purpose of automation and creates queues that frustrate users and slow operations. Oversight should be proportionate: concentrated where it actually matters.

Human oversight required | Human oversight not required

High-stakes decisions affecting customers, finances, legal exposure, or compliance | Routine, low-risk interactions within established parameters

Edge cases and anomalies the AI wasn't trained on | Internal drafts and working documents not shared externally

Sensitive data handling involving personal, confidential, or regulated information | Standard queries with well-defined, predictable outputs

Policy enforcement points where violations carry consequences | Repetitive tasks where the AI has a strong track record

Actions that would be difficult or impossible to reverse | Easily reversible actions with minimal downstream impact

A marketing team using AI to draft internal meeting notes does not require the same oversight as a customer service AI making commitments to clients. The risk profile is different, and the governance should reflect that.

How HITL Works Operationally

Four components need to work together for human-in-the-loop to function:

Triggers define the conditions under which AI actions pause for human review. Rule-based triggers flag interactions involving customer data or financial commitments above a threshold. Policy-based triggers fire when AI usage potentially conflicts with organisational guidelines. Confidence-based triggers route to humans when the AI itself signals uncertainty about its output. The balance is catching genuinely risky interactions without overwhelming reviewers with false positives that train them to click "approve" without thinking.
Workflows route flagged interactions to the right reviewers with the right context. A customer service escalation goes to a team lead with full conversation history. A data handling concern goes to a compliance officer with details about what data is involved and what the AI proposes to do with it. Without proper routing, reviews either land with people who lack context to evaluate them or pile up in a generic queue where urgency gets lost.
Approval gates are the decision points where humans explicitly authorise or reject AI actions. Beyond simple approve/reject, reviewers often need options to approve with modifications, request additional information, or escalate to someone with more authority. The gate must also capture the decision for audit purposes. This means recording who approved what, when, and with what information available.
Escalation paths address what happens when reviewers are uncertain, unavailable, or when decisions exceed their authority. Without clear escalation, ambiguous cases either stall indefinitely or get approved by people who shouldn't be making the call. The system needs to define who handles escalations, what timeframes apply, and what happens when approvals aren't obtained.

At Velatir, these components come together in visual, node-based workflows. Triggers fire when AI interactions match policy conditions. Human intervention nodes create review tasks routed to Slack, Microsoft Teams, email, or the web dashboard. Reviewers see full context and can approve, reject, or request changes. Every decision is logged for audit.

The Automation Bias Challenge

Putting humans in the loop doesn't guarantee they'll exercise independent judgment. Research consistently shows that humans tend to over-rely on automated recommendations. This phenomenon is called automation bias. It has been documented across domains from aviation to healthcare to judicial decision-making, and it's one of the central challenges in making HITL actually work.

The problem intensifies under conditions common in enterprise environments: time pressure, high review volumes, cognitive load, and trust in systems that are usually right. When an AI recommendation is correct 95% of the time, reviewers learn to approve quickly. The 5% of cases requiring intervention get waved through along with everything else. The human is in the loop technically. The oversight itself has become performative.

The EU AI Act addresses this directly, requiring that deployers ensure human overseers "remain aware of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system." Awareness alone is insufficient.

Effective mitigation requires:

Workload management – Reviewers facing hundreds of decisions per day will default to rubber-stamping. Sustainable review volumes are a prerequisite for oversight that functions.
Interface design that prompts evaluation – Systems where "approve" is the path of least resistance encourage automation bias. Requiring reviewers to engage with specific elements of the AI output before deciding can slow the automatic approval reflex. For instance, confirming they've checked key fields before the approve option becomes available.
Periodic audits – Inserting known errors or edge cases into the review stream and tracking whether reviewers catch them reveals whether oversight is functioning or merely performative.
Training on AI limitations – Reviewers who understand specifically when and how the AI tends to fail are better positioned to catch those failures. Training should focus on concrete examples of failure modes.

The goal is to ensure reviewers evaluate AI outputs before accepting them.

What Effective HITL Systems Get Right

Making HITL work depends on a few specific design choices.

1. Risk-proportionate triggers

The temptation is to trigger human review for anything that might possibly be risky. This backfires. Reviewers facing a constant stream of low-risk items learn to approve without evaluating, which means they'll also approve the genuinely risky items that show up.

Triggers should be calibrated to your actual risk landscape, then tuned based on operational data. If 98% of triggered reviews are routine approvals, your triggers are too broad. If incidents slip through that should have been caught, they're too narrow.

2. Qualified reviewers with authority

The EU AI Act requires that humans assigned to oversight have "the necessary competence, training and authority to carry out that role." This reflects a real operational issue.

The person reviewing an AI's recommendation to approve a financial transaction needs to understand both the transaction itself and the AI system's limitations. They also need the organisational authority to reject or escalate without fear of being second-guessed.

Assigning review tasks to whoever happens to be available, or to junior staff who feel pressure to approve what the system recommends, undermines the entire point of having humans in the loop.

3. Sufficient context for decisions

A reviewer who sees only an AI output with no background cannot meaningfully evaluate whether that output is appropriate. Effective systems surface the information reviewers actually need: what triggered the review, what the AI is proposing, what data is involved, what the consequences of approval or rejection would be, and any relevant history.

This context needs to be presented efficiently. Reviewers handling multiple reviews cannot spend ten minutes investigating each one. At the same time, skimping on context produces uninformed approvals.

Where to Start

If you are deploying AI systems that affect customers, handle sensitive data, or make decisions with compliance implications, human-in-the-loop is already a regulatory requirement under the EU AI Act for high-risk applications. Enforcement for high-risk systems begins in August 2026.

The first step is mapping where human oversight actually needs to exist in your AI workflows. Focus on the points where errors carry real consequences. From there, build the triggers, workflows, and approval gates that make oversight operational.

Most organisations discover they already have informal human oversight in places. Someone eyeballing AI outputs before they go out. A manager spot-checking decisions. The work is formalising that into systems that are consistent, auditable, and structurally reliable.

Building human oversight into your AI governance? Get in touch to see how Velatir's workflow-based approach puts humans in the loop where it matters.

What Is Human-in-the-Loop AI? The Complete Guide