Skip to content

03. Severity and blast radius — classify the fire before pulling the lever

~12 min read. Severity is not a feeling. In AI systems it is a judgment about harm, spread, reversibility, and whether the system is still making things worse.

Continues from 02-first-fifteen-minutes.md. The fire captain has a channel, a snapshot, and a first update. Now the captain must decide how large the fire is and which firebreak is justified.

The previous chapter gave the first response order: declare, assign, snapshot, contain, communicate, investigate. That solved chaos, but it left a judgment problem: not every bad answer deserves the same lever. This chapter teaches how to size the incident before the team either underreacts or shuts down too much.


1) The wall — "bad answer" is too vague for incident command

"The model gave a bad answer" is not an incident severity. It is a complaint shape.

The same phrase can mean a harmless typo, a hallucinated legal instruction, a leaked document, a wrong medical recommendation, an unauthorized refund, or a runaway tool loop. Those do not deserve the same response.

Severity starts with four questions:

harm:          what bad outcome can happen?
spread:        how many users/tenants/workflows can still hit it?
reversibility: can we undo the effect?
momentum:      is the system still making it worse?

If harm is high, spread is broad, effects are hard to reverse, and momentum is active, the fire captain pulls a strong firebreak. If harm is low, spread is narrow, effects are reversible, and momentum stopped, the response can be calmer.


2) A practical severity ladder for AI products

Use this as a starting point, then adapt it to your product.

Severity AI-specific shape Default response
Sev-1 active privacy leak, unauthorized money movement, unsafe high-stakes advice, runaway tool execution, public harmful behavior disable feature or tool path, page leadership, customer/security comms
Sev-2 broad correctness regression, stale policy answers, safety classifier bypass in limited scope, major cost spike rollback or targeted disable, active war room, frequent updates
Sev-3 narrow bad-answer slice, degraded quality, non-critical eval regression, isolated latency fallback assign owner, snapshot, patch, monitor
Sev-4 internal-only issue, typo-level output, low-risk prompt bug, no customer impact normal bug process with evidence retained

This ladder is intentionally operational. It tells the team what to do, not merely what to call the incident.

Mini-FAQ. "Can severity change?" Yes. Start with the safest plausible classification, then downgrade when evidence proves smaller blast radius. Under-declaring early is usually more expensive than downgrading later.


3) Worked example — refund agent severity

The refund assistant is wrong. Which severity?

The first classification depends on what the agent can do.

agent only answered text
  -> correctness + financial guidance risk
  -> likely sev-2 until scope known

agent called refund approval tool
  -> unauthorized money movement risk
  -> sev-1 if active or broad

agent affected one internal test tenant
  -> contained quality bug
  -> sev-3 or sev-4

The same visible sentence moves severity when the system boundary changes. That is why lead engineers ask about tool authority, tenant scope, and reversibility before debating model behavior.

For the running incident, assume the agent recommended a refund but did not execute it. The first response can be sev-2: customer-visible financial guidance may be wrong, blast radius unknown, and enterprise refund flows are still active.

The firebreak should match that risk: disable refund recommendations or degrade to policy excerpts, not necessarily take the entire support assistant offline.


4) Blast radius map

Blast radius is the set of places the bug can still hurt.

one trace
  └─ same user?
      └─ same tenant?
          └─ same workflow?
              └─ same prompt version?
                  └─ same model route?
                      └─ same retrieval index?
                          └─ all traffic?

Every containment decision is a bet about this map. A narrow firebreak protects availability but risks missing spread. A broad firebreak protects users but hurts product continuity.

The snapshot room should help narrow the map quickly: prompt version, model route, tool path, tenant filters, index version, feature flag, and deployment time.


5) Why not always pull the biggest kill switch

The tempting alternative is to disable the whole AI feature whenever severity is unclear. That can be correct for privacy, safety, or money movement. It can also train the organization to fear every incident and avoid declaring them.

The better rule is proportional containment. Pull the smallest firebreak that stops credible harm while preserving enough product function and evidence.

Examples:

  • Disable one tool, not the whole assistant.
  • Route one tenant to safe fallback, not all tenants.
  • Roll back one prompt version, not the model provider.
  • Change from "take action" to "show cited policy only."
  • Rate-limit a runaway flow while investigation continues.

The firebreak is a product decision under uncertainty. The fire captain owns the call; the postmortem evaluates whether it was too narrow or too broad.


6) Production signals — severity should become more precise

The first metric is remaining exposed traffic: requests per minute still hitting the suspected path.

The misleading metric is number of complaints. Many AI failures are underreported because users cannot tell the answer is wrong.

The expert signal is a blast-radius table:

Slice Exposed? Evidence Action
enterprise refund prompt v17 yes bad trace matches degraded mode
consumer refund prompt v9 no different policy corpus monitor
refund approval tool no execution found tool logs clear keep disabled until review
old retrieval index possible candidate mismatch compare exact traces

Severity improves as this table fills in.


7) Boundary — severity is not blame

Severity is about response urgency, not engineer guilt. A sev-1 can come from a reasonable change that interacted with stale retrieval. A sev-3 can come from a careless prompt edit with limited blast radius.

The pathology is severity negotiation as ego protection. The lead move is to classify risk first, then assign root cause later.


Recall checkpoint

  • What four questions drive severity?
  • Why can the same bad answer be sev-1, sev-2, or sev-4?
  • What is proportional containment?
  • Why is complaint count a weak severity metric?

Interview Q&A

Q: How do you classify severity for an AI incident? A: Start with harm, spread, reversibility, and momentum. Then map the incident to product authority: did the system only answer, recommend, execute, leak, or spend?

Common wrong answer to avoid: "Use HTTP error rate and affected users only." AI incidents can be severe with low error rate and unknown affected count.

Q: Why not always disable the whole AI product during uncertainty? A: Sometimes you should, especially for privacy, safety, or money movement. But often a targeted firebreak stops harm while preserving product function and better evidence.

Common wrong answer to avoid: "Availability always matters more." Trust and safety can dominate availability in AI systems.

Q: What proves blast radius is shrinking? A: A slice table showing exposed traffic by tenant, workflow, prompt version, model route, index version, and tool path, with containment action per slice.

Common wrong answer to avoid: "Fewer complaints means smaller blast radius." Complaints lag and many users never report wrong AI behavior.


Apply now (10 min)

Model the exercise. Classify the refund incident under three variants: text-only answer, automatic refund execution, and internal test tenant only.

Your turn. Pick one AI feature and define sev-1 through sev-4 examples for that product.

Reproduce from memory. Explain why severity is harm plus spread plus reversibility plus momentum.


What you should remember

This chapter explained AI severity and blast radius. The important idea is that severity is an operational decision about risk, not a label for emotional intensity.

Carry this diagnostic forward: before choosing a firebreak, ask what harm is possible, how far it can spread, whether effects are reversible, and whether the system is still making it worse.

Remember:

  • "Bad answer" is not precise enough for incident command.
  • Tool authority changes severity.
  • Blast radius is sliced by tenant, workflow, prompt, model, index, and tool path.
  • Pull the smallest firebreak that stops credible harm.

Bridge. Severity tells us how hard to contain, but containment can destroy evidence if we move too quickly. Next we build the snapshot room. → 04-snapshot-the-system.md