Most AI governance still lives in policy documents. A PDF outlines acceptable use. A spreadsheet tracks which tools are approved. Someone reviews it quarterly.
This does not work when AI agents are sending messages in Slack, processing customer data, and executing code in real time. A policy document on SharePoint does nothing to prevent an agent from leaking proprietary code into a public channel or hallucinating a discount for a customer.
The organisations doing this well have shifted from policies to measurement. They treat governance as a data problem. They capture signals from AI workflows and use that data to detect problems, enforce rules, and demonstrate compliance.
Why Approved Tools Lists Failed
For years, the primary mechanism for AI governance was the approved tools list. If a tool was not on the list, employees could not use it. This model has largely collapsed.
Employees find workarounds. They use personal accounts. They access tools through mobile devices on personal networks. The tools they need are available instantly, while the approval process takes weeks or months.
More importantly, the risk is no longer just about which tool is being used. It is about how it is being used. An approved LLM can still be used in problematic ways: feeding it sensitive PII, allowing it to execute unauthorised transactions, or letting it access data it should not see.
Monitoring behaviour is more effective than blocking URLs. This is why organisations are adopting tools that sit alongside AI workflows, monitoring message patterns, data flows, and agent actions in real time.
What Organisations Actually Measure
Practical governance is defined by what gets measured. The metrics that matter fall into three categories: adoption, risk, and behaviour.
Adoption and Resource Allocation
Visibility into which AI agents are delivering value is a priority for leadership. Measuring adoption is about understanding engagement and utility.
Agent utilisation rates: Which automated workflows are most active? This identifies where AI is providing the most value and where investments are not paying off.
Cost per task: Organisations are moving beyond cost per token to measure the actual cost of completing a specific business process via AI. A workflow that costs $0.02 per token but requires 50 attempts to complete a task is more expensive than one that costs $0.05 per token and succeeds on the first try.
Success and failure latency: How long does it take for an agent to complete a task? How long does it take to fail? Fast failures are better than slow failures. Slow failures consume resources and delay human intervention.
Risk and Security Triggers
This is where governance becomes operational. Organisations set specific triggers to flag high-risk behaviour before it escalates.
- PII and sensitive string detection. Monitoring for specific patterns (credit card numbers, national ID formats, internal project codenames) within messages sent by AI agents to platforms like Slack or Teams. When a pattern matches, the system flags it for review or blocks it automatically.
- Prompt injection attempts. Tracking unauthorised attempts to override agent instructions or bypass safety controls. These attempts indicate either malicious intent or a gap in how the agent handles adversarial inputs.
- Data outflow volume. Monitoring for unusual spikes in the amount of data being processed or exported by an agent. A sudden increase could indicate a data exfiltration event or a misconfigured workflow.
Behaviour and Anomalies
Beyond explicit risks, governance involves monitoring for patterns that indicate something is wrong.
Sentiment drift: In customer-facing agents, shifts in tone or sentiment can signal model degradation or misalignment with how the organisation wants to communicate. If an agent starts responding more curtly or formally than intended, that is a signal worth investigating.
Loop detection: Identifying when an agent gets stuck in repetitive logic. This wastes resources and can lead to system instability. It also indicates a gap in the agent's error handling.
Unauthorised integrations: Detecting when agents begin interacting with third-party APIs or data sources that were not part of the original workflow. This can happen when agents are given too much autonomy or when someone modifies a workflow without proper review.
How This Data Gets Used
The value of measurement is in what you do with the data.
Automated enforcement: When a trigger fires, the system can block the action, require human approval, or log the event for later review. The response is proportionate to the risk. A minor policy deviation gets logged. A potential data breach gets blocked and escalated.
Trend analysis: Viewing triggers over time reveals patterns. If one department consistently fires more PII triggers than others, that indicates either a training gap or a workflow that needs redesign. If prompt injection attempts spike after a new agent deployment, that agent may need additional safeguards.
Compliance documentation: Auditors and regulators ask what AI systems you are using, how they process data, and what controls you have implemented. Real-time monitoring provides the evidence. You can show exactly which triggers fired, how they were resolved, and who was notified.
Feedback loops: Real-world data informs policy. If a particular trigger fires constantly but never results in actual violations, the trigger is too sensitive and should be adjusted. If incidents slip through that should have been caught, the triggers need to be tightened. Governance improves over time based on what actually happens.
Making Governance Operational
The distinction between governance and operations is narrowing. In organisations doing this well, governance is built into the infrastructure.
This means:
Triggers are defined in workflows. When someone builds an AI workflow, they also define what gets monitored and what triggers human review. Governance is part of the design process.
Alerts go to the right people. A PII trigger in a customer service workflow routes to the customer service lead. The people closest to the work are the first to respond.
Dashboards show current state. Leadership can see what is happening across the AI ecosystem. This supports faster decisions and earlier intervention.
Policies evolve based on data. The governance framework changes based on what the monitoring reveals. New risks get new triggers. False positives get tuned out.
Where to Start
If you are still relying on approved tools lists and quarterly reviews, the gap between your governance and your actual AI usage is probably wider than you realise.
Start by identifying where AI is already in use. Browser-level discovery can reveal tools and workflows that IT does not know about. This gives you a baseline.
Then define what you want to monitor. Start with the obvious risks: PII exposure, unauthorised data access, prompt injection attempts. Add behavioural monitoring as you learn what patterns matter in your environment.
Build the triggers into your workflows. When someone deploys an AI agent, the monitoring should be part of the deployment. Governance is part of how AI gets used.
Measure, review, adjust. Governance is an ongoing process of calibration based on real-world data.
Want to see what your AI governance data actually looks like? Get in touch to see how Velatir provides the monitoring, triggers, and workflows to make governance operational.