Skip to content

01. What counts as an AI incident — when green dashboards still mean red product

~11 min read. A backend incident usually announces itself with errors. An AI incident may announce itself with one plausible answer that should never have shipped.

Built on 00-eli5.md. The alarm bell is not always an exception. Sometimes it is a support ticket, a bad eval slice, a policy escalation, a sudden cost spike, or one customer screenshot that proves the system crossed a product boundary.

The ELI5 gave us the fire station picture: alarm, captain, snapshot, firebreak, status, and after-action lock. Before any of those moves can happen, the team needs one decision: is this actually an incident or just a bug? This chapter turns "something feels wrong" into an incident boundary a lead can defend.


1) The wall — uptime is not the incident boundary

An enterprise support agent answers, "Yes, refund the customer," for an account that is explicitly ineligible. The API returned 200. The tool call succeeded. The output was valid JSON. The user saw a confident answer, and the operations dashboard stayed green.

That is still an incident.

The root cause is that AI products fail on more axes than availability. They fail through correctness, grounding, authorization, safety, cost, latency, privacy, policy, and user trust. A service can be technically healthy while the product is operationally unsafe.

The fire captain needs a broader definition:

ordinary incident             AI incident
error rate ↑                  wrong-but-plausible answer
latency ↑                     unsafe refusal miss
database down                 cross-tenant retrieval leak
deployment crash              model quality regression
queue backlog                 tool loop spends $2k/hour

The incident boundary is not "did software throw an error?" It is "did the AI system create unacceptable user, business, safety, privacy, legal, or cost risk?"


2) The incident classes a lead must name

A useful AI incident taxonomy starts with user impact, not implementation layer.

Class Example First question
Correctness agent gives stale policy How many users saw wrong advice?
Grounding answer cites a document it did not use Did evidence support the answer?
Privacy retrieved another tenant's document Was restricted data exposed?
Authorization tool ran outside user scope Did the system perform an action it was not allowed to perform?
Safety harmful request was answered Did policy boundaries fail?
Cost loop or prompt expansion spikes spend Is spend still increasing?
Latency degraded answer path times out Are users blocked or receiving fallback?
Evaluation release passed broad evals but failed critical slice Which slice regressed?

This table matters because every class implies a different firebreak. A privacy leak may require immediate feature disable. A cost runaway may require rate limiting. A quality regression may require model rollback. A policy failure may require guardrail tightening and customer communication.

Mini-FAQ. "Can one incident have multiple classes?" Yes. A prompt regression can cause a safety miss, which triggers a tool call, which creates a privacy exposure. Classify the dominant risk first so containment is fast, then preserve the multi-cause story for the postmortem.


3) Worked example — the bad refund answer

Take the running incident for this module.

User complaint. "Your enterprise support bot told my customer success manager to approve a refund after 90 days, but our enterprise renewal policy says refunds are only allowed for 30 days."

At first glance, this sounds like a simple answer-quality bug. A lead does not stop there.

complaint
  ├─ correctness risk: wrong policy
  ├─ financial risk: refund approval
  ├─ authorization risk: did a tool approve it or only suggest it?
  ├─ scope risk: all customers or one tenant?
  └─ recurrence risk: new prompt/model/retrieval change?

The smallest useful incident statement is:

"Possible sev-2 AI correctness incident: refund assistant may be recommending ineligible enterprise refunds. Unknown blast radius. No evidence yet of automatic refund execution. Snapshot and containment in progress."

Notice what this statement does not claim. It does not name root cause. It does not blame the model. It does not promise the customer that the problem is fixed. It gives the fire captain enough language to move.


4) Why not wait for statistical proof

The tempting alternative is to wait until dashboards or evals prove the regression. That feels disciplined because it avoids overreacting to one screenshot.

It fails when the first screenshot is the only visible signal before harm spreads. AI incidents often start as qualitative evidence: one wrong answer, one leaked snippet, one suspicious tool call. Waiting for aggregate metrics can turn a small blast radius into a public incident.

The mature rule is to separate declaration from certainty. Declaring an incident means "we need coordinated containment and evidence preservation." It does not mean "we already know the root cause."


5) Production signals — deciding whether the alarm is real

The first artifact to inspect is the user-visible output plus the trace that produced it. Without both, the team argues from screenshots and memory.

Watch these early signals:

  • user-visible harm or policy violation
  • repeated complaints from the same slice
  • eval failures on a critical slice
  • sudden tool-call, token, or cost spike
  • cross-tenant or unauthorized retrieval candidates
  • new deployment, prompt, model, index, or guardrail change in the window

The misleading metric is global availability. A 99.99% healthy API says nothing about whether the answers are safe, grounded, authorized, or affordable.

The expert signal is a slice comparison: affected tenant, workflow, prompt version, model version, retrieval index version, tool path, and time window.


6) Boundary — incident or normal bug?

Treat it as an incident when harm can spread, evidence can disappear, customer trust is at risk, money can move, private data may have leaked, safety policy may have failed, or spend is still rising.

Treat it as a normal bug when the issue is isolated, low-risk, reproducible, already contained, and does not need cross-functional coordination.

The pathology is false calm. Teams under-declare AI incidents because the system "looks up." The better failure mode is a lightweight incident that closes quickly after snapshot and blast-radius check.


7) Design review checklist

  1. What harm class is possible: correctness, grounding, privacy, authorization, safety, cost, latency, or evaluation?
  2. What artifact proves the user-visible output and the trace behind it?
  3. What slice might be affected?
  4. What recent change could have shifted behavior?
  5. What containment action would stop spread without destroying evidence?
  6. Who is the fire captain?

Recall checkpoint

  • Why can a green API still be an AI incident?
  • What is the difference between incident declaration and root-cause certainty?
  • Why does the first artifact need both output and trace?
  • Which harm classes imply immediate containment?

Interview Q&A

Q: A customer reports one bad AI answer, but dashboards are green. Do you declare an incident? A: If the answer implies user harm, financial action, privacy exposure, safety failure, or broad regression risk, yes. Declaration starts coordination and evidence preservation; it does not claim root cause is known.

Common wrong answer to avoid: "Wait until error rate rises." Error rate may never rise for plausible-but-wrong AI behavior.

Q: What makes AI incident severity different from normal backend severity? A: AI severity includes semantic harm: wrong advice, unsafe answer, ungrounded citation, unauthorized tool action, privacy leak, or runaway cost. Availability is only one dimension.

Common wrong answer to avoid: "If the service is up, it is not an incident." Product harm can happen while every endpoint returns 200.

Q: What is the first evidence package you want? A: The user-visible output, full trace, prompt, model/version, retrieval candidates, tool calls, guardrail decisions, config flags, and timestamp.

Common wrong answer to avoid: "Ask the model to reproduce it." Reproduction can mutate behavior and does not preserve the original scene.


Apply now (10 min)

Model the exercise. Take the refund complaint and classify it as correctness plus financial-risk incident until tool execution is ruled out.

Your turn. Pick one AI feature you know. Write three incidents that would have green uptime but red product behavior.

Reproduce from memory. Explain why incident declaration is about coordinated containment, not root-cause certainty.


What you should remember

This chapter explained why AI incident response starts with harm, not exceptions. The important idea is that an AI product can be operationally dangerous while the backend is technically healthy.

Carry this diagnostic forward: if a bad answer can spread, leak, spend, act, or damage trust, ring the alarm bell and preserve evidence before debating root cause.

Remember:

  • Green uptime does not prove AI product safety.
  • Incident declaration buys coordination; it is not a root-cause claim.
  • The first artifact is output plus trace.
  • Severity starts with harm class and blast radius.

Bridge. Once the alarm is real enough to coordinate, the first minutes decide whether evidence survives and harm spreads. Next we learn the first fifteen minutes. → 02-first-fifteen-minutes.md