The 90 to 100 Percent Automation Gap
Why 95% is worse than it sounds
The intuitive model of automation value is linear: 50% automation saves half the work, 95% saves almost all of it. This is wrong. The actual value curve is concave until near-100%, then jumps discontinuously.
At 95% reliability, a human must:
- Monitor every execution to detect the 5% failure case
- Maintain enough context to intervene when failure occurs
- Keep the skill to handle the task manually (skill atrophy is real)
- Bear the cognitive load of vigilance — knowing failure is possible but unpredictable
The monitoring cost is roughly constant regardless of the failure rate. Whether failures happen 5% or 20% of the time, someone must watch. The only rate that eliminates monitoring is 100%.
Cat Wu at Anthropic frames this as a first principle: push automations to 100% or accept that the process remains manual. The middle ground — “mostly automated” — captures the worst of both worlds: the complexity of automation plus the overhead of human monitoring.
Implications for agent workflows
This principle applies directly to AI agent harnesses and workflows:
- An agent that handles 95% of code reviews still requires a human to check every review, because the 5% failure case (approving buggy code) is catastrophic. The human saves no time.
- An email triage agent at 90% accuracy means 1 in 10 emails is misrouted. The user must scan all triage decisions to catch errors. Net time saved: near zero.
- A deployment pipeline that’s 99% reliable still requires on-call engineers. Only at 100% (or with automated rollback that itself is 100%) does the human overhead disappear.
The actionable implication: before automating a workflow, decide whether 100% is achievable. If not, the investment may not pay off. Partial automation that requires full monitoring is often worse than a simple manual process, because it adds system complexity without removing human attention.
The anti-pattern: over-customizing agent workflows
Wu identifies a related failure mode: spending excessive time building MCPs, customizing skills, and perfecting agent configurations. This can become procrastination disguised as productivity. If the automation being built will plateau at 90-95% reliability, the time spent configuring it may exceed the time it ever saves. The tool should serve the work, not become the work.
Related Notes
- Cat Wu - Head of Product Claude Code Cowork at Anthropic — source (YouTube, 1h25m)
- Agent Harnesses — parent topic; the 90-100% gap is a design constraint for harness builders
- Harness Simplification as Models Improve — as models improve, some automations that were stuck at 95% may reach 100%, changing the calculus