Agent Minds

Precise language prevents sloppy engineering. We project agency onto fluent bots, creating a security risk. Real intelligence requires stakes - a loop with reality LLMs lack. Solve this with architecture: explicit intent and proof.

3 days ago   •   3 min read

By Dr Shaun Conway

With the recent release of Anthropic's Claude Constitution, YouTube seems to be full of people obsessed with asking if these models have minds.

My personal take is this question does more harm than good. Not because it’s unimportant, but because it’s imprecise. And when your language is sloppy, your engineering gets sloppy.

Let’s look at this from first principles.

We know that humans project agency by default. Daniel Dennett called it the "intentional stance"—we assume intention because it’s a cheap, efficient heuristic to predict behavior.

If something talks like a person, we treat it like a person.

LLMs are the perfect super-stimulus for this heuristic. The interface is persuasive, but persuasion isn't evidence.

So when a user says, "It feels like the model understands," I don’t dismiss them. But I don’t treat that feeling as a measurement instrument either. Prediction is cheap; meaning is expensive.

There’s a recent line of thinking (often discussed in pieces like the Noema essays on biological intelligence) that draws a sharp distinction here: living things don’t just model patterns; they have stakes. They exist in closed loops with reality. They get hungry, they seek advantage, they avoid damage. They pay a price for being wrong.

Only What Is Alive Can Be Conscious | NOEMA
Artificial intelligence doesn’t meet the test.

Current models don’t have that loop. They don’t have a "why." They just have a next-token probability distribution.

This is where the conversation usually goes off the rails into metaphysics. Let’s keep it operational.

If you are designing an agentic system, the question isn’t "Is it conscious?" The question is:

"What authority do we grant it?"

The failure mode isn't that we fail to recognise a machine soul. The failure mode is that we delegate authority to a system just because it sounds confident.

This is where the biology comparison actually becomes useful. We have a weird blind spot: we underestimate the agency of living systems (like ecology) because they don't speak our language, and we overestimate the agency of chatbots because they do.

That error in judgment is a security risk.

If you accept that your users—even the smart ones—will anthropomorphise the bot, you have to change how you build. You can’t rely on human judgment as your safety layer. You have to assume the user will be fooled.

In my work building Qi and IXO, this implies hard constraints:

  • Intent must be explicit. No guessing.
  • Authority is delegated, not assumed. The bot doesn't "have" rights; it gets delegated concise permissions.
  • Outcomes are evaluated against evidence. Not against the bot's explanation of the outcome.

This is why I harp on shared state, audit trails, and verifiable workflows. It’s not because I love bureaucracy. It’s because these are the only things that survive contact with fatigue, incentives, and politics.

Solving the AI Productivity Paradox
“Rework Tax” explains why AI makes us feel faster while work takes longer. Why optimising nodes breaks the network - and how Shared State restores the physics of production.

Learn about shared state, audit trails, and verifiable workflows.

A fluent agent is dangerous if the boundary between "recommendation" and "execution" is fuzzy.

Imagine an agent helping manage treasury operations. It lives in a chat interface. It speaks confidently. Eventually, a tired operator will treat a suggestion as an approval. They’ll say "yep, do it" on a Friday afternoon.

If your system wires that chat directly to execution, you haven’t built an AI assistant. You’ve built a social engineering vulnerability.

The fix isn't to "train the model to be safer." The fix is architectural. You require cryptographic proof of intent, scope-limited capabilities, and a distinct step where the human signs a transaction, not just a chat message.

So, forget the mind question. Focus on the boundary question.

  • Where in your stack are you relying on trust in outputs when you should be relying on proof of intent?
  • What is the absolute smallest set of affordances this agent needs to be useful?
  • If a smart, tired human makes one bad approval, does the system fail gracefully, or does it cascade?

Spread the word

Keep reading