10. Cascading failure — one bad step should not become everyone's problem¶

~15 min read. AI pipelines often fail like falling dominoes: a small mistake early becomes a polished disaster later.

Built on the ELI5 in 00-eli5.md. The sealed ward matters most when failure starts spreading, because cascading failure is what happens when one infected component reaches the rest of the hospital.

1) First picture: local success can still produce global failure¶

A pipeline is not healthy just because most steps return something. What matters is whether the whole workflow remains trustworthy.

step 1: classify intent wrong
step 2: tool chosen correctly for the wrong intent
step 3: tool returns valid data
step 4: model writes polished answer

all later steps look healthy
end result is still wrong

The simple version: Cascading failure is sneaky. Later steps may be technically correct. They are just correct on poisoned input. That is why end-to-end reliability needs isolation, validation,

and stop conditions. The triage desk must ask, "Is this input trustworthy enough to continue?"

2) Common cascade paths in AI systems¶

AI systems have several repeating cascade shapes.

wrong classification → wrong tool path,
stale retrieval → confident hallucinated answer,
bad tool output → unsafe agent action,
provider slowness → queue growth → timeout storm,

repeated retries → rate-limit wave → broader outage.

cascade families
┌────────────────────────────────────┐
│ semantic cascade  = wrong meaning  │
│ data cascade      = wrong facts    │
│ control cascade   = wrong actions  │
│ resource cascade  = shared overload│
└────────────────────────────────────┘

Not every cascade is semantic. Some are resource cascades. A slow provider can exhaust threads, which slows unrelated features. The sealed ward is for both.

3) Isolation patterns stop blast radius¶

Now what should we build to contain spread? At least these patterns matter.

per-step validation,
bounded queues,
separate worker pools,
breaker per dependency,
context sanitization before next step,

stop-the-line rules for high-risk mismatches.

bad pipeline
all requests share one worker pool
       │
       └── slow model saturates everything

good pipeline
separate pools for chat, tools, and heavy research
       │
       └── one sick path does not starve all others

For example, a customer-support suite has:

billing chat,
order chat,
FAQ assistant. All share one model queue. Billing tool latency spikes. Billing requests accumulate. Now FAQ requests wait behind them. Soon the whole suite looks down. That is a resource cascade.

Separate queues would have contained it. The sealed ward should have isolated billing pressure from FAQ traffic.

4) Validate handoffs between steps, not only final output¶

Every step handoff is a danger point. One step may output something plausible but low trust. The next step should not consume it blindly.

planner output
{"tool":"refund_order","confidence":0.41}
      │
      ├── bad design  → execute anyway
      └── good design → require more evidence or human review

The simple version: Low-trust intermediate state should not silently advance. For example, an agent classifies a message as, "customer requests cancellation." Confidence is only 0.48.

If the cancellation tool runs anyway, a dangerous cascade begins. If the handoff validator blocks low-confidence action, you contain the error early. The senior doctor may then review. Later components never see poisoned state.

5) Resource cascades often start with timeouts and retries¶

Now what is the operational version of dominoes? One service slows. Clients wait longer. Concurrency rises. Queues deepen. Retries multiply traffic.

More services slow.

slow model
   │
   ▼
more in-flight requests
   │
   ▼
queue builds
   │
   ▼
clients timeout and retry
   │
   ▼
even more traffic arrives

See the loop. This is why timeouts, retry budgets, and circuit breakers are not separate topics. They are anti-cascade tools. For example, a research assistant shares one outbound proxy for web fetches. One slow source website causes fetch calls to hang. Agent workers stay occupied. Queued requests time out. Clients retry the same research tasks. Soon model capacity is wasted on stale retries too.

That is a classic cascade. The vitals monitor should catch queue growth, not only request errors.

6) Context poisoning is the AI-specific cascade risk¶

Traditional systems worry about resource spread. AI systems also worry about meaning spread. One wrong intermediate fact can enter prompt context, then get repeated as if it were truth.

retrieval fetched stale policy
      │
      ▼
model cites stale policy
      │
      ▼
verifier accepts citation format only
      │
      ▼
user receives confidently outdated answer

Look carefully. Every step behaved "normally." The data itself was poisoned. So what to do? Validate provenance. Check freshness.

Tag trust level on intermediate artifacts. Do not let low-trust context flow freely. The triage desk should classify artifact trust, not only request type.

7) Worked design pattern: compartmentalized agent pipeline¶

A safer agent pipeline might do this.

intent step
   │ validates confidence
   ▼
read-only evidence step
   │ validates freshness + entity match
   ▼
action proposal step
   │ validates policy + idempotency
   ▼
execution step or human approval

See the containment. Each boundary has a gate. Each gate can stop the line. That is how you prevent one mistake from becoming a polished catastrophe. The sealed ward can exist inside one workflow, not only between external services.

Where this lives in the wild¶

GitHub Copilot workspace actions — agent safety engineer: isolates repository-modifying steps behind validation gates so a bad planning step cannot directly trigger large code edits.
Intercom Fin — support automation architect: separates policy retrieval, draft generation, and action-taking workers so a slow account tool does not stall every simple FAQ response.
Perplexity — answer pipeline engineer: validates citation freshness and source availability before synthesis so stale retrieval does not cascade into confident but outdated answers.
Klarna assistant — workflow reliability lead: keeps payment-affecting tools on isolated queues with stricter approval rules to prevent one unstable financial connector from degrading all chat traffic.
Cursor — orchestration engineer: limits auto-edit loops so one bad verification signal cannot trigger repeated broken patch attempts across many files.

Pause and recall¶

Why can a pipeline suffer cascading failure even when later steps are technically healthy?
What are the main cascade families in AI systems?
Why do separate queues or worker pools reduce blast radius?
How does context poisoning create an AI-specific cascade risk?

Interview Q&A¶

Q: Why should intermediate handoffs be validated instead of only checking the final output? A: Early validation prevents poisoned state from propagating into later steps that may amplify the original mistake. Common wrong answer to avoid: "Because final output checks are too expensive." Cost may matter, but containment is the key point. Q: Why are bounded queues and separate worker pools reliability features, not just performance features? A: They limit shared-resource contagion, preventing one degraded path from starving unrelated workflows. Common wrong answer to avoid: "Because faster systems always isolate better." Isolation is about containment, not raw speed. Q: Why do retries often worsen cascading failures during overload incidents? A: Retries inject extra work into a system already struggling, which deepens queues and spreads latency outward. Common wrong answer to avoid: "Because retries are mathematically unstable." The issue is operational amplification. Q: Why is context poisoning especially dangerous in AI pipelines? A: Later model steps often treat earlier artifacts as trusted context, so one wrong fact can become repeated and legitimized. Common wrong answer to avoid: "Because AI models cannot handle long prompts." Prompt length is not the core failure here.

Apply now (5 min)¶

Exercise. Take one multi-step AI workflow. Mark where semantic, data, control, and resource cascades could begin. For each spot, write one containment mechanism.

Sketch from memory. Draw a domino chain. Then redraw it with gates, queues, and a sealed ward that stops spread after the first bad step.

Bridge. We now know how failures spread. Next we should trigger some of them on purpose, before real users do, through chaos testing. → 11-chaos-testing-ai.md