Skip to content

04. Circuit breakers — close the ward before infection spreads

~14 min read. A circuit breaker protects the whole hospital by refusing more bad traffic to a sick dependency.

Built on the ELI5 in 00-eli5.md. The sealed ward — isolating a dangerous path — is what stops one failing model or tool from infecting everything around it.


1) First picture: three states, one purpose

A circuit breaker is a traffic decision. Not a fix. It does not heal the model. It prevents more damage while the model is sick.

closed ── normal traffic flows
  ├── too many failures ──→ open
  │                         │
  │                         └── no traffic, use fallback
  └── after cool-down ───→ half-open
                            ├── probe succeeds ─→ closed
                            └── probe fails ───→ open

The simple version: Closed means healthy enough. Open means stop sending traffic. Half-open means test carefully. That is the whole idea. The sealed ward protects shared resources,

user latency, and your retry budget.

2) When should a breaker open?

The production problem: If you open too early, you throw away useful capacity. If you open too late, you waste users on a dying path. So breaker triggers should follow real failure signals.

Common signals:

  • consecutive failures,
  • rolling error rate,
  • timeout rate,
  • malformed output rate,
  • business-rule failure rate. Picture the decision.
last 30 requests to model-x

success success timeout timeout timeout parse_fail timeout
                └── threshold crossed → breaker opens

See the nuance. A breaker can watch more than transport errors. If a model keeps returning broken JSON, that path may be operationally unhealthy, even with 200 OK. For example, suppose your JSON tool-calling model handles 100 checkout-agent requests. During one minute:

  • 18 requests timeout,
  • 9 requests return invalid tool arguments,
  • 4 requests succeed only after retries. If your rule is, "Open if hard failure rate exceeds 20%," you likely trip the breaker. That is correct. The path is no longer trustworthy. The vitals monitor feeds the sealed ward here.

3) Per-model and per-feature breakers beat one global breaker

Not all traffic is equal. Not all dependencies fail together. A global breaker is too blunt for many AI systems. You often want per-model, per-provider,

per-tool, or even per-feature breakers.

breaker map

chat-summary model    → breaker A
code-edit model       → breaker B
refund tool           → breaker C
web-search provider   → breaker D

The simple version: If the code-edit model is sick, your summarization feature may still be fine. If the refund tool is timing out, your FAQ assistant should not be disabled. For example, an AI suite uses:

  • model A for cheap summaries,
  • model B for code generation,
  • tool T for order lookup. Model B starts returning many 503 errors. If you trip a global breaker, all AI features degrade. If you trip only breaker B, summaries still work, order lookup still works,

and only code generation falls back. That is better patient care. The sealed ward should isolate the infected room, not close the whole hospital.

4) Half-open state is a controlled probe, not a full reopening

Teams often misuse half-open state. They send full traffic back immediately. Then the dependency collapses again. Half-open should be cautious.

open for 30 s
allow 1 probe request
   ├── success with good latency and valid output → close breaker
   └── failure or bad output                     → reopen breaker

Half-open is a medical recheck. Not discharge paperwork. For example, a provider outage settles after two minutes. Your breaker cool-down is 30 seconds.

At 30 seconds, a half-open probe still sees timeouts. Breaker reopens. At 60 seconds, probe succeeds but latency is 12 seconds, far above the agreed 4-second ceiling.

Keep it open. At 90 seconds, probe succeeds with 2.3 seconds and valid JSON. Now close. So what counts as recovery? Not mere success.

Healthy-enough success.

5) Breakers must trigger fallbacks, not dead ends

A breaker that only says "no" is incomplete. Users still need help. So the sealed ward should point somewhere. Usually to the backup ambulance, the stability kit, or the senior doctor.

breaker open on premium model
      ├── fallback small model
      ├── cached answer
      ├── partial workflow
      └── human queue for high-risk cases

For example, a coding assistant uses a premium model for multi-file edits. Breaker opens after three consecutive 503 responses. Fallback policy says:

  • single-file autocomplete → local cached completion model,
  • complex refactor → queue for retry later,
  • destructive changes → require human approval. See the design. The breaker does not solve everything. It hands the request to the next safe path.

6) Common breaker mistakes

Now what should you avoid? First, do not share one breaker across unrelated tenants if tenant behavior differs sharply. Second, do not ignore silent failures. Third,

do not reopen on one lucky success if error rate remains high. Fourth, do not hide breaker state from dashboards. Fifth, do not let the retry dose keep hammering after the sealed ward opened.

bad pattern
retry helper ignores breaker state
open breaker exists, but retries still hit model

That is self-sabotage. The breaker is supposed to be a hard gate. Not a polite suggestion.


Where this lives in the wild

  • GitHub Copilot — inference platform engineer: trips a per-model breaker after consecutive 503 errors from the main endpoint and shifts lightweight completions to a cached local model for a short cool-down window.
  • Cursor — agent runtime lead: maintains separate breakers for code-edit generation and repository-indexing tools so one sick dependency does not freeze the entire agent loop.
  • Perplexity — search infrastructure engineer: uses provider-specific breakers because one web-search vendor can degrade while the answer-generation model stays healthy.
  • Intercom Fin — support workflow owner: opens a breaker on malformed structured outputs when citation JSON failure crosses threshold, then falls back to plain-text answers with less automation.
  • Klarna assistant — risk systems engineer: keeps a dedicated breaker for payment-affecting tools so checkout chat can continue while refund actions remain sealed off.

Pause and recall

  • What are the three circuit-breaker states and what does each mean?
  • Why can malformed output justify opening a breaker even when HTTP status is 200?
  • Why are per-model or per-feature breakers usually better than one global breaker?
  • What makes half-open probing safe instead of reckless?

Interview Q&A

Q: Why open a circuit breaker instead of relying only on retries during a dependency outage? A: Retries still consume traffic and budget, while a breaker stops further pressure and activates safer alternate paths. Common wrong answer to avoid: "Because retries are slower than breakers." Speed matters, but containment is the main reason. Q: Why should breaker thresholds consider output validity and latency, not just status codes? A: A dependency that answers slowly or returns unusable payloads is operationally unhealthy even without transport errors. Common wrong answer to avoid: "Because status codes are deprecated for AI systems." They are still useful, just incomplete. Q: Why is a per-feature breaker often superior to a global breaker in AI products? A: It preserves unaffected capabilities and limits blast radius when only one model or tool path degrades. Common wrong answer to avoid: "Because global breakers are impossible to implement." They are possible, but often too coarse. Q: Why should half-open success require healthy behavior rather than any successful response? A: A lucky slow or malformed success can mislead the system into reopening traffic before the dependency is truly usable. Common wrong answer to avoid: "Because half-open probes should maximize throughput." The goal is validation, not throughput.


Apply now (5 min)

Exercise. Choose one AI workflow with at least two dependencies. Define a breaker for each dependency. Write the open threshold, probe rule, cool-down duration, and fallback action.

Sketch from memory. Draw the closed → open → half-open loop. Add one arrow from the vitals monitor into the sealed ward, and one arrow from the sealed ward to the backup ambulance.


Bridge. A breaker tells us when to stop using the main path. Next we need strong alternate paths, which is the work of the backup ambulance. → 05-fallback-strategies.md