Governing AI With Power, Not Metaphors

We’re watching a strange inversion happen in enterprise AI, and it’s headed for a wall.

2 hours ago • 6 min read

By Dr Shaun Conway

The people building the systems are increasingly being governed by people who misunderstand what they’re governing. The goal is safety, but the result isn't a lower risk profile. It’s just paralysis.

Think about a typical day for a dev right now. They wire up a simple internal AI assistant. It reads a few dashboards, drafts a report, and uses pre-approved connectors. Identity propagates correctly. Permissions are scoped. It’s a clean bit of work.

Five minutes of engineering. Two weeks of paperwork.

Why? Because someone in a different building decided that every "agent" must be registered, described, risk-assessed, and sponsored—as if a Markdown file were a new employee joining the firm.

This is where language quietly hijacks architecture. We used the word "agent," and governance teams heard "autonomous actor." They built a bureaucracy around a metaphor.

The Category Error

Most "agents" today are just a prompt, a routing configuration, and a thin wrapper around an LLM call. They aren't autonomous entities with self-directed goals. They don't wake up with ambitions. They execute bounded instructions within scoped tools on behalf of a human.

The real agent—legally and operationally—is the human whose intent flows through the system.

But once we anthropomorphise a router, we start governing the wrong layer. We create registries for text files. We assign executive sponsors to configurations. We write lifecycle policies for prompts.

None of this touches the actual risk surface. It’s security theatre, and it’s expensive.

Where Risk Actually Lives

You don’t secure an orchestra by registering the sheet music. You secure the musicians and their instruments.

In AI systems, risk lives in three specific places: tool permissions, identity propagation, and state changes. It doesn’t live in the existence of a routing configuration.

If an LLM can write to a database, trigger a payment, or modify infrastructure, then governance has to exist at the boundary where those actions occur. That’s it.

Central agent registries feel comforting because they create the appearance of order. Every agent has an ID and a sponsor. But unless those permissions are enforced at runtime, the registry is useless. Static approval isn't dynamic control. A router approved six months ago tells you nothing about the data classification at the moment of execution, or whether the human’s intent actually aligns with policy.

Documentation doesn't stop damage. Architecture does.

A Different Path

The anxiety from governance teams is rational. Tool use does change the failure mode. When an LLM answers incorrectly, you get a bad sentence; when it calls a tool incorrectly, you get a bad action.

The mistake is governing the router instead of the capability.

As we have been building Qi to be an intelligent cooperating system over shared state, our goal wasn’t to make a better registry. It was to build a system of governed cooperation. In this model, the unit of governance isn't the "agent"—it’s the capability invocation.

Every action is capability-scoped. Every tool call is mediated through declared permissions. If a human doesn't have the right to perform an action, the flow simply cannot execute it. Identity propagates through execution, not through a spreadsheet.

This isn't just a nuance; it’s a structural shift. You can build a thousand flows, but if they all execute against the same governed capability layer, your risk doesn't explode. Only your productivity does.

Stop thinking like a gatekeeper and start thinking like a platform engineer.

The goal is to provide safe defaults and paved paths so that "doing it the right way" is actually the path of least resistance.

Here’s how we have approached this:

1. Separate "The What" from "The How"

Developers should define what a tool does (the schema, the inputs, the outputs). The platform should define how it is secured (the auth injection, the logging, the rate limiting).

The Principle: Don't make devs write security logic inside their tool code.

The Reality: Use a middleware layer or a wrapper. If a dev writes a function to "Fetch Invoice," the system should automatically wrap it in an identity check and an audit log without the dev touching a single if authorised: block.

2. Move from "Allow-lists" to "Capability Delegation"

In a registry-heavy world, you maintain a list of who can do what. This doesn't scale. Instead, use cryptographically signed user-controlled authorisation tokens (UCANs), with human and AI understandable intent statements, that travel with the execution flow.

The Principle: The request carries its own permission.

The Reality: When a user triggers a Flow, the system issues a short-lived token that says: "This execution has the authority of User X, restricted to Scope Y, for 60 seconds." The tool doesn't need to call a central database to check permissions; it just validates the token.

3. The "Instruction vs. Action" Divide

Never pass raw user prompts directly to a shell or a database. Ever.

The Principle: The LLM is the translator, not the executor.

The Reality: The LLM produces a structured payload (JSON) based on a strict schema. A separate, non-LLM "Executor" validates that JSON against the schema and then performs the action. If the LLM tries to hallucinate a new parameter like delete_all: true, the schema validation kills the request before it hits the database.

4. Explicit Side-Effect Classification

Not all tools are created equal. You need to categorise capabilities by their impact on the world.

Read-Only: Low friction. Minimal audit.

Idempotent Write: Medium friction. Requires identity propagation.

Critical State Change: High friction. Requires human-in-the-loop (HITL) or multi-party authorisation (e.g., moving money, changing permissions).

The Principle: Map the friction to the risk. Don't make a "Read Dashboard" tool jump through the same hoops as a "Transfer Funds" tool.

5. Observability as Governance

In a distributed, agentic system, you cannot predict every failure. You must optimise for detectability and reversibility.

The Principle: If you can’t prevent it, you must be able to see it and undo it.

The Reality: Every capability invocation must be logged with its full context: the prompt that triggered it, the human identity behind it, the tool's response, and the resulting state change. This turns your audit log into a "Time Machine" rather than just a graveyard of text.

6. Smallest Functional Unit (The "Object Capability")

Avoid building "God Tools" that can do everything.

The Principle: Decompose power.

The Reality: Instead of an "Admin Tool," build a "User Password Reset Tool" and a "User Role Update Tool." It is much easier to govern, audit, and revoke access to a specific, narrow skill than to a broad, powerful one.

The Strategy for Velocity

If you want devs to actually adopt this, you provide a CLI or SDK that scaffolds these "Qi Skill Capsules" automatically.

They run qi create-skill.

It generates the schema, the wrapper, and the test suite.

The "governance" happens automatically during the deploy command, where the platform checks if the requested scopes are allowed for that specific team.

This moves governance from a meeting to a linter.

The Hard Truth

Agent operating systems such as Microsoft’s Agent 365 product try to map HR metaphors onto distributed systems. Agents are treated like employees, with IDs like staff badges, and sponsors like managers. It feels intuitive, but it’s technically misaligned.

We don’t need to govern routers like people. We need to govern capabilities like power.

Power to move value. Power to issue claims. Power to change state.

If you’re leading an enterprise right now, you have a choice. You can treat every AI configuration as a new hire and drown in overhead, or you can architect runtime-enforced, capability-scoped cooperation.

The first path feels safe, but the second one is actually safer. It binds governance to the physics of the system rather than the speed of a committee.

The future isn't more paperwork. It’s better architecture.

A couple of questions to self-reflect:

If my "agent registry" disappeared tomorrow, would our system actually be less secure, or just less documented?
Are we governing markdown files, or the API keys agents with skills have access to?