AI Agent Governance in 2026: What the EU AI Act Actually Requires and How to Build for It

The EU AI Act isn’t a future problem anymore. The high-risk provisions come into force on 2 August 2026 — just over two months from now — and if you’re deploying AI agents that affect people in EU member states in certain domains, you need to be ready. Not “working on it” ready. Actually ready, with documentation you could put in front of a national supervisory authority today.

Here’s the thing: most of the teams I’ve spoken to who are building agentic systems are either vaguely aware of the deadline and not doing much, or convinced their use case doesn’t qualify as high-risk. Both of those stances carry more risk than they seem to.

What “High-Risk” Actually Means

The AI Act defines high-risk AI systems by the context of their deployment, not just their technical architecture. An LLM is not inherently high-risk. An agent that uses an LLM to rank job applications, assess loan eligibility, set insurance premiums, or triage patients in a healthcare setting? That’s high-risk by definition, regardless of how sophisticated or unsophisticated the underlying model is.

The full list in Annex III of the Act covers: biometric identification, critical infrastructure management, education and vocational training decisions, employment and worker management, access to essential services (credit, social benefits, insurance), law enforcement, migration and border control, and administration of justice. If your agent touches any of these categories and its output influences a decision about a person, you’re in scope.

This catches more teams than expected. A CV screening tool that a recruiter uses to shortlist candidates — high-risk. An agent that flags invoices for manual review in a credit decision workflow — worth checking. An AI assistant that recommends treatment options for clinical staff — high-risk. The test isn’t whether the agent makes the final decision. It’s whether the agent’s output materially influences a human decision that affects someone’s access to employment, services, or safety.

What the Act Actually Requires

For high-risk systems, the obligations cluster into four areas.

Technical documentation. You need to produce and maintain a technical file that describes the system’s intended purpose, the data used to develop and validate it, the risk management measures you’ve implemented, and the accuracy, robustness, and cybersecurity properties. This isn’t a one-time document — it needs to stay current. If you deploy a new model version, the documentation updates.

Transparency and auditability. High-risk systems must log enough information to reconstruct and audit decisions after the fact. For an agent, this means logging: the inputs to each LLM call, the model version and parameters, the tool calls made and their results, and the final output. You need to be able to show, for any decision that affected a person, exactly what the agent received and what it produced. Langfuse, Arize Phoenix, and similar observability tools are the practical implementation layer here — but they need to be configured to retain data for the required period (generally the lifetime of the AI system, and at minimum as long as the decisions they support remain legally significant).

Human oversight mechanisms. High-risk agents must be designed so that a human can effectively oversee, intervene in, and override the system’s output. This isn’t just a “human in the loop” checkbox — you need to demonstrate that the human reviewing the output has enough information and capacity to actually exercise meaningful judgement. Rubber-stamping an agent’s output isn’t oversight. The interface and workflow design matters as much as the technical safeguards.

Conformity assessment. Before deployment, most high-risk systems require a conformity assessment — either self-assessment (for most categories) or third-party assessment (for biometric systems and a few others). Self-assessment still requires real documentation that you’ve worked through the required checks. ISO/IEC 42001, the AI management system standard published in late 2023, has become the de facto framework most teams are using to structure this process.

The Practical Gap Between Compliance and Competence

Reading the obligations is one thing. The tricky part is that a lot of what the AI Act requires — meaningful human oversight, accurate technical documentation, data governance for training data — exposes structural problems in how many agentic systems are actually built.

Teams that assembled their agent quickly, using whatever LLM produced the best outputs at the time, often can’t accurately document what training data was used or how the model’s behaviour was validated for their specific use case. The audit trail that Langfuse generates is useful, but only if you’ve configured it to capture everything that matters and if you’re retaining it in a way that satisfies data residency requirements.

Fair enough if you’re a small team and this is catching you off-guard. The NIST AI Risk Management Framework 1.1, released in March 2026, is a more practical starting point than the Act itself for building the internal processes you need. It maps well onto the Act’s requirements and is written for practitioners rather than lawyers. If you’re working in a regulated industry, your legal team will want you talking to them before August anyway, but the NIST RMF gives you the operational vocabulary to have that conversation productively.

What to Do in the Next Two Months

Start by classifying your deployments. Make a list of every agent your organisation has in production or near-production, and map each one against the Annex III use case list. Most won’t be high-risk. Find the ones that are.

For the ones that qualify, run a gap assessment against the four requirement areas above. Technical documentation probably needs work. Logging and retention is often incomplete. Human oversight mechanisms are frequently cosmetic rather than substantive.

Then prioritise. The Act’s enforcement is handled by national supervisory authorities, and the focus in the first year will be on categories with the highest potential harm — employment, credit, healthcare. If your high-risk deployments are in those sectors, that’s where to start.

The teams that come out of August in good shape aren’t the ones who read the most compliance blog posts. They’re the ones who actually sat down, mapped their systems, and did the documentation work. That’s the unglamorous thing — but it’s the thing that actually needs doing.