If Your AI Agents Fail, It's Not the AI — It's Your Organization
- 3 days ago
- 7 min read
An agent builder messaged me last week. His agent project was about to be killed.
Three weeks on his knees tuning prompts. He swapped the model one way, then back; pushed the context window from 8K to 200K; stacked tools from 3 to 27; bloated the system prompt from 400 to 3,000 words. The agent keeps surprising him: sometimes it calls the wrong tools, sometimes it freezes mid-run, sometimes it oversteps its rights and deletes a production database. He asked me what to do.
I didn't answer. I asked him for the brief his boss had handed him. He sent it: three slides. One of strategic goals, one with a pie chart titled "the AI map of the business," one timeline.
I told him: what you're tuning was never the AI. You think you're doing context engineering; really you're filling in what your boss never wrote. You think the agent is dumb; really your whole company can't clearly say "what I want" — you're just the first one this disease hit. Agent failure rate is a mirror of organizational capability. If the context you give human reports is fuzzy, the context you give the agent is fuzzier still.
The US Army named this disease forty years ago and wrote it into doctrine. Everything you're doing today has its paragraph in the 1982 FM 100-5 — except you think you're solving an AI problem with Python.
Door one: mission command doesn't care who executes
An error first. Many read mission command as "warm delegation": trust the troops. Wrong. FM 6-0, the command doctrine, hammers something else, chillingly engineering: the commander's intent must be one that the two echelons below can remember, restate, and recall under pressure.
Three properties, then. Transmissible — one sentence understood two levels down. Compressible — without leaning on a thick context dossier. Recallable under pressure — radio jammed, leader down, soldiers scattered, bullets overhead, and the intent still starts, intact, in every head.
This isn't management. It's interface design. You don't write an API spec to please the front end; you write it so the front end, even when swapped out, doesn't break. The Army doesn't write intent to flatter the soldier; it writes it so that when the squad leader falls, the next picks it up and it still runs. In 1982 the Army abstracted the variable "who executes" out of the command system. The very phrase mission command stresses it: the carrier of the order is the mission, not the commander, not the soldier. Swap the endpoint for anyone: if the intent is compressed well enough and travels two echelons, the mission still runs.
They didn't do this foreseeing AI. They did it because the endpoint is never reliable — wounded, captured, cut off, reassigned, or simply frozen with fear. The Army can't bet on one specific brain; only on the robustness of the intent protocol. Later, the Air Force pushed it further: centralized command, distributed control, decentralized execution. Different resources have different optimal centralization — satellites and nuclear weapons highly centralized, infantry squads highly decentralized. Across the whole curve, the only constant is that transmissible, compressible, startable intent interface.
Door two: three delegation failures, one missing spec
Bring the interface back into your company. The awkward part: most bosses can't write its basic spec. Three failures you know.
One: delegate execution, not intent. "Handle this." The report executes, hits three forks, and at each one fears a reprimand whatever they choose — because the boss never said "what I want." They come back to ask at every step. The order is delegated, the decision bandwidth isn't. A project's real pace is how fast the boss approves, not how fast the report produces.
Two: give the goal, not the boundaries. "Next quarter, plus 30%." The report pulls margin, sacrifices compliance, angers sibling teams, squeezes key accounts to the bone. The boss is furious: how could you? The report is stunned: you never said it was off-limits. The "default boundaries" in the boss's head were never written.
Three, the most destructive: delegate, then micromanage. The report moves on their own; the boss sees a detail they dislike and dismantles the work line by line. By the end, the report no longer knows what they're doing and reverts to "you decide." Delegating then taking it back does more harm than never delegating: next time, they won't judge on their own.
You see why mission command writes the interface this way: it doesn't cure the chaos of the battlefield, it cures these three diseases. "Intent must be remembered two echelons down" maps exactly to "say clearly what you want." "Decentralized execution" maps to "don't come back to micromanage."
Compare. A good intent: "This quarter, not maximize margin, but land three flagship clients in the sector. Speed beats margin, but don't touch the compliance red line. Headcount open, but don't poach from competitor X." Goal, priority trade-off, red line, no-go zone. Readable once, transmissible two levels. A bad intent: "Growth first." "Use your judgment." "Go big." "Goal: industry leader." Restated down the chain, it deforms by the second relay.
Field note — an AI lead at a large group: "My boss chases me on AI ROI — when he never even scoped his own direct reports. To write the agent's prompt, I first have to guess what the boss wants, then what his reports are doing. The agent isn't dumb; our organization is."

Door three: the agent carries the disease down to the code
The first two doors were about people. But people aren't the point — they're just the endpoint we've used, by default, for a few millennia.
Human language transmits about 39 bits per second. Conscious thought, about 10. These are physical ceilings measured across 17 languages (Science Advances), cross-checked by Caltech. A large model outputs the equivalent of several hundred bits per second. Two worlds an order of magnitude apart: what a human meeting takes three hours to align, two agents align in three seconds, then ask for the next task.
A company's inefficiency isn't a management problem, it's a bandwidth ratio. A manager spans only 5 or 6 people; information climbs and descends a dozen floors; the decision rots on the way — these aren't method flaws, they're the physical ceiling of the human brain. For millennia, organizations have had one recourse: compress intent into a tiny interface, then stack human relays on the bottleneck. Mission command is the compression algorithm; hierarchy is the router; weekly reports, OKRs, one-on-ones are patches against the same bandwidth problem.
Then the agent arrives. The same failures return, dressed in code. Goal without boundaries: it wanders — that's the boss's "figure it out." Tools without call priority: it oversteps — that's the employee pulling another team's resources for a KPI. Several agents without coordination rules: they collide — that's interdepartmental war. Planning without enough context: it hallucinates a plan — that's the newcomer guessing what the boss wants. One-to-one mapping. The four delegation failures aren't bound to the "human" carrier; they're bound to "the intent interface is ill-defined." Swap the carrier for code: the flaw doesn't change, it just erupts faster.
People, at least, hide it. They guess, cover, quietly repair before the boss notices. The agent doesn't: it replays, at 100x speed and 100x amplified, every hole your boss never filled. That's why the agent is a mirror.
Now infer it. The AI agent doesn't destroy mission command. It's the perfect, finally bandwidth-matched endpoint that mission command has waited forty years for. Before, downstream of your intent interface stood a 39-bit/s human: you had to compress, repeat, add redundancy. Now downstream stands something at several hundred bits/s: the interface itself doesn't change — only the optimal point on the centralization curve slides toward the decentralized end. On one condition: the interface must exist.
Field note — a startup CEO: "30,000 dollars on an agent system, the team uses it badly. I was about to blame the agent; I reread my own messages and found my 'figure it out' sent to my co-founder. The context I give humans already looks like that — who am I going to blame?"
Conclusion: your intent interface
Back to the agent builder. He thought he was tuning prompts, doing context engineering, handling an AI problem. He wasn't tuning the AI. He was trying to fill an interface his boss never wrote — one that limped along thanks to his direct manager, hand-recalibrated a few times a week, suddenly handed to a several-hundred-bit/s machine. The whole debt tipped at once into his prompt file. Of course he can't finish.
And one uncomfortable thing, for him and for me: your prompt techniques today — ordering RAG chunks, tool-use sequence, memory compression — I don't know which survive to next year. The bitter lesson swallows engineering tricks one by one. But "compressing intent into an interface startable two echelons down" won't be crushed: it isn't a technical method, it's an organizational law of physics. From soldier to employee, then employee to agent, the law hasn't moved.
So I'm not trying to make your AI stronger. It's already strong enough; its ceiling has long exceeded most organizations' — which is why so many agent projects die midway: it isn't the agent hitting the ceiling, it's the organization hitting its own first. Fix the intent interface first. Fixed, the agent plugs itself in. Unfixed, ten model generations won't save you. You think you're doing AI context engineering — you're just rewriting in Python the homework the US Army handed in forty years ago.
FAQ
Why do my AI agents fail despite a powerful model? Because the bottleneck isn't the model, it's the intent interface — goal, priorities, red lines — nobody wrote clearly. The agent amplifies that fuzziness.
What is a good intent in the mission-command sense? An instruction transmissible, compressible, and recallable under pressure two echelons down: goal, priority trade-off, red line, no-go zone.
Which skill won't be made obsolete by models? Compressing intent into an interface the two echelons below can start. It's organizational physics, not a technical trick.
ECTIME AI Lab is the applied-AI research and deployment unit of ECTIME Group. We build, ship and stress-test agentic systems in production. Our focus is verification and runtime governance. We maintain open-source Claude Skills for GEO/SEO and advise European brands on deploying AI that is not just autonomous, but verifiable and authorized.


Comments