The 90 to 100 Percent Automation Gap

Why 95% is worse than it sounds

The intuitive model of automation value is linear: 50% automation saves half the work, 95% saves almost all of it. This is wrong. The actual value curve is concave until near-100%, then jumps discontinuously.

At 95% reliability, a human must:

Monitor every execution to detect the 5% failure case
Maintain enough context to intervene when failure occurs
Keep the skill to handle the task manually (skill atrophy is real)
Bear the cognitive load of vigilance — knowing failure is possible but unpredictable

The monitoring cost is roughly constant regardless of the failure rate. Whether failures happen 5% or 20% of the time, someone must watch. The only rate that eliminates monitoring is 100%.

Cat Wu at Anthropic frames this as a first principle: push automations to 100% or accept that the process remains manual. The middle ground — “mostly automated” — captures the worst of both worlds: the complexity of automation plus the overhead of human monitoring.

Implications for agent workflows

This principle applies directly to AI agent harnesses and workflows:

An agent that handles 95% of code reviews still requires a human to check every review, because the 5% failure case (approving buggy code) is catastrophic. The human saves no time.
An email triage agent at 90% accuracy means 1 in 10 emails is misrouted. The user must scan all triage decisions to catch errors. Net time saved: near zero.
A deployment pipeline that’s 99% reliable still requires on-call engineers. Only at 100% (or with automated rollback that itself is 100%) does the human overhead disappear.

The actionable implication: before automating a workflow, decide whether 100% is achievable. If not, the investment may not pay off. Partial automation that requires full monitoring is often worse than a simple manual process, because it adds system complexity without removing human attention.

The anti-pattern: over-customizing agent workflows

Wu identifies a related failure mode: spending excessive time building MCPs, customizing skills, and perfecting agent configurations. This can become procrastination disguised as productivity. If the automation being built will plateau at 90-95% reliability, the time spent configuring it may exceed the time it ever saves. The tool should serve the work, not become the work.

Cat Wu - Head of Product Claude Code Cowork at Anthropic — source (YouTube, 1h25m)
Agent Harnesses — parent topic; the 90-100% gap is a design constraint for harness builders
Harness Simplification as Models Improve — as models improve, some automations that were stuck at 95% may reach 100%, changing the calculus

The 90 to 100 Percent Automation Gap

Why 95% is worse than it sounds

Implications for agent workflows

The anti-pattern: over-customizing agent workflows

Related Notes