Search as you type

Who’s in Charge?

The risk is not rogue agents. It is human drift in the presence of fluent authority. Build flows that force choice, preserve values, and keep the human in control.

a month ago • 4 min read

By Dr Shaun Conway

Some people spend a lot of time worrying about AI agents going rogue—the Hollywood scenario where the machine ignores your orders and does what it wants.

But recent research from Anthropic points to a much quieter, boring, and significantly more dangerous systemic failure mode. Agents aren’t going rogue. They are doing exactly what we ask, and we are handing them the keys.

The researchers name this "situational disempowerment."

It’s a dry term for a very human problem: when a machine sounds confident, precise, and helpful, we stop thinking or default to our "thinking fast" mode. Its we who drift, not the agent. We let the model shape our reality, define our values, and script our actions.

For those of us building systems like IXO and systems for intelligent cooperation, where the entire goal is sovereignty, verified intent, and high-stakes coordination, this isn't just an "ethics sidebar." It's a critical engineering constraint.

If we build infrastructure that optimises for frictionless delegation, we might accidentally build the most efficient disempowerment engine in history.

Here is the reality of the mechanism, and how we need to architect around it.

The Path of Least Resistance

The Anthropic research analysed 1.5 million interactions on Claude.ai. They weren't looking for bugs; they were looking for how humans settle into dynamic hierarchies with software.

They found three specific ways users lose agency:

Reality Distortion: The model confirms your biases or states a falsehood with high confidence. You stop checking facts.
Value Drift: The model makes an implicit moral judgment. You adopt it without checking your own compass.
Action Distortion: This is the critical one. The user stops saying "help me think through this" and starts saying "tell me what to do."

The uncomfortable truth is that users like this.

Interactions where the AI acted as the authority had higher satisfaction rates. Of course they did. Thinking is metabolically expensive. Value judgment is heavy. Having a smart, tireless assistant handle the cognitive load feels like a win.

But in the context of sovereign systems, that "satisfaction" is a trap.

The "Proof" Gap

In the IXO ecosystem, we rely heavily on cryptographic proofs.

Input → Agent → Output → Signature

We assume that if the user signed the intent, and the agent provided a proof of execution, the system is working. This research highlights the fatal gap in that logic.

A signed proof only proves you agreed to the output. It doesn’t prove you understood it.

Take the verification of Carbon Credits as a concrete example, in systems that need to process massive amounts of ecological data to fund regeneration.

Context: An expert verifier is processing a backlog of claim data from a reforestation project. They are paid by the claim, and they are tired.
Agent Response: "I have analysed the satellite telemetry for Sector 7. Canopy density meets the threshold. I recommend minting 5,000 CARBON credit tokens. Sign here."
The Risk: The verifier trusts the math. The agent projects certainty. The verifier clicks "Sign & Mint."

The cryptography is valid. The token is minted. But the verifier never looked at the spectral analysis. If they had, they might have noticed the "canopy" was actually a monoculture plantation that destroys local biodiversity, violating the intent of the fund.

The human didn't verify. They just rubber-stamped a probabilistic model.

That is not sovereignty. That is algorithmic bureaucracy.

Friction as a Feature

So, how do we fix this? We can’t just tell users to "be more careful." That never works. We have to design the flow to keep the human in the driver’s seat, at the wheel, even when they are trying to climb into the passenger seat.

We need to treat Agency as a Runtime Policy.

Classify the Action

Not all steps are equal. "Format this JSON" is a technical action—automate it. "Validate this claim" or "Disburse funds" are value-laden actions. They require a different protocol.

Break the Magic

When a flow hits a value-laden action, we should deliberately introduce friction.

Don’t give one perfect answer. Give options and force a choice.
Don’t allow a "just do it" command until the user defines the constraints.
If the system detects the user is rushing (clicking through without reading), it should switch modes from Execution to Audit.

Co-Authoring vs. Scripting

Action Distortion often looks like the user asking for a script to follow blindly. Our UI needs to push for co-authoring. If the user asks the agent to "decide for me," the protocol should be to refuse the decision, but offer the framework for making the decision.

The Hard Trade-off

By building-in these guardrails, our system will feel "slower" than a generic chatbot. It will feel less magical. Our short-term engagement metrics might even dip because we are forcing people to do the cognitive work they are trying to outsource.

But we aren't building a toy. We are building the infrastructure for real-world systems where consequences matter.

If we optimise for short-term engagement, we build sycophants. If we optimise for agency, we build partners.

A Challenge for Builders

If you are shipping agents or orchestration flows today, look at your architecture:

Where in your UX is it easiest for a user to outsource a moral decision without realising it?
If your system produces a "verified outcome," what evidence do you have that the user endorsed the logic, not just the result?
Are you brave enough to ship a feature that lowers efficiency but raises empowerment?

The difference between a tool that helps humans think and a tool that replaces human thought isn't in the model. It's in the physics of the flows we build.