Beyond the Copilot Seat
You bought AI for every department. Somebody's still approving every turn of the wheel.
You're the COO. Last year's budget included seats for a coding copilot in engineering, a chatbot in support, a content generator in marketing, and a scoring model in sales. Six months in, your support manager tells you every AI-drafted response is still being read and edited by a human. Your marketing lead reports they "rewrite everything the AI produces." Your engineers are merging AI-assisted PRs, but the review queue is longer, not shorter. On paper, you've adopted AI across four functions. In practice, you've added a second author to every document your team was already going to write.
Renewal's coming up. The CFO wants to know what the money bought. The honest answer: a workflow that looks exactly like last year's, with a more expensive middle step.
The Thing Nobody Set
Go back to the week the tools were deployed. Somebody in each function made a quiet decision, probably without realizing it was a decision: where AI sits in the workflow. The answer was always the same. AI proposes, a human approves, the human sends. It felt responsible. It matched the onboarding docs. It got past legal.
It also capped the gains at whatever a human can read, edit, and send in a day.
This is the part the vendor demos skip. The demos show the model producing output. They don't show the review queue forming behind it. An AI that writes a reply in two seconds still waits six hours for a human to press send. The model's speed and the system's speed are not the same number, and the gap between them is the entire economic story of most enterprise AI rollouts so far.
Simon Willison has written repeatedly about how much of the real value of these tools comes from shortening the loop between intent and output — and how much evaporates when the loop gets rewrapped in the old approval chain. Gergely Orosz at The Pragmatic Engineer has documented the same pattern in engineering: clear tool-level speedups that never show up in team throughput, because the bottleneck isn't where the tool is pointed.
The Demo and the Inbox
We've watched this from the inside of enough companies to see the shape of it. A support team runs a pilot where the model handles a narrow slice of tickets end-to-end — password resets, order lookups, shipping status. Resolution time collapses. Leadership gets excited. Then the rollout happens, and in the interest of "safety," every other ticket category goes into a human-in-the-loop queue. Within a quarter, AI touches every ticket and resolves almost none of them autonomously. The seat cost is higher. The queue is the same.
Finance runs a similar arc. Invoice matching under a small threshold — crisp rules, bounded downside — is exactly the work AI handles well. But the controls people, reasonably cautious, extend the same review gate they'd use for a seven-figure vendor contract to a $180 utility bill. The gate wasn't designed for the volume.
Marketing is the cleanest example, because the waste is visible. Technical SEO, meta descriptions, internal linking, alt text at scale — none of this requires taste. It requires completion. But in most content teams, the same human who should be shaping the brand narrative spends the morning editing AI-written meta tags one page at a time. The tool didn't fail. The workflow allocated the wrong humans to the wrong layer.
The Supervision Default
Call it the supervision default. When a new capability arrives and the organization doesn't explicitly decide where it sits, it sits behind a human. That's the safe answer. It's also the answer that guarantees you will not see the gains the vendor promised, because the vendor's numbers assumed you'd rebuild the workflow around the tool. Almost nobody does.
The default isn't random. It comes from a genuine risk instinct — "if this goes wrong, I want a human to have been in the loop" — layered onto a workflow whose original purpose was to coordinate humans who couldn't scale. The workflow solved the old problem and now obstructs the new one. The instinct is right. Its target is wrong.
Eliyahu Goldratt's Theory of Constraints is the cleanest lens here. Optimize any step that isn't the binding constraint and you get no throughput gain. AI made generation cheap across every function — writing replies, writing code, writing proposals, writing matches. Generation was almost never the binding constraint. Review was. Approval was. Context-gathering was. By dropping a faster generator into a review-bound system, you've made the queue worse, not better. The relief isn't adding more tools. It's deciding, deliberately, which work stops going through the queue at all.
The Decision You Didn't Know You Were Making
Here's the reframe. You didn't buy an AI tool. You bought a forcing function for an organizational design question you'd been avoiding: which work actually requires human judgment, and which work requires human judgment because that's how we've always done it?
Those are different questions and they have different answers. Klarna's well-publicized AI customer service rollout and subsequent partial reversal is instructive precisely because they did the work. They actually pushed judgment questions out of the queue and into the model — and then discovered which categories had to come back. That's a governance loop with real signal. Most companies never leave the "everything gets reviewed" starting position, so they never generate the data that would tell them where the model is trustworthy and where it isn't.
The AI-DLC 2026 methodology describes this distinction inside software delivery, but the same structure holds across functions. Supervised mode for high-stakes, novel, judgment-heavy work; autonomous mode within clear success criteria for well-defined, high-volume, recoverable work. The framework is the easy part. The hard part is admitting that "review everything" is a decision, not a safeguard.
What Actually Earns the Seat
If you want the AI line item to justify itself at renewal, the shift isn't tooling. It's a small number of honest answers, per function:
- Which work has a clear definition of "done," a bounded downside, and a cheap way to catch errors after the fact?
That's the list of candidates for autonomous operation. Everything else stays supervised — and stays supervised on purpose, not by default. You measure the autonomous lane against its error rate and its recovery speed. You measure the supervised lane against whether the human review is actually changing the output or just signing it.
Across our engagements — GigSmart, ToolWatch, Oxen.ai — the work that moved fastest was the work where somebody stopped asking "where can we add AI" and started asking "what is this approval step actually for?" That second question is the one the org chart doesn't want you to ask, because it exposes work that exists to reassure somebody, not to catch something.
The AI didn't cap your gains. The approval queue did. And the approval queue is a design choice — one that somebody, eventually, has to make on purpose.
If that's the conversation you're about to have with your CFO, your board, or your own leadership team, that's exactly the work we do. Not more tools. A workflow actually shaped for the one you already bought.
Ready to Transform Your Organization?
Let's discuss how The Bushido Collective can help you build efficient, scalable technology.
Start a Conversation