05. Error and recovery flows¶

Explainability builds trust through transparency. Errors are inevitable; the recovery flows are what determine whether users abandon or persist when things go wrong. The discipline is to make errors recoverable, not just visible.

A platform engineer at a Pune SaaS company audits the AI feature's error UX. The findings: when the AI fails (timeout, refusal, blank response), the user sees a generic "Something went wrong. Please try again." Users either retry mechanically (same prompt, same failure) or abandon. The team redesigns: timeout shows a specific message with the option to try a simpler question or escalate; refusal shows the reason with a path forward; blank response shows "I couldn't generate a complete response for that question; try rephrasing or contact support." Recovery rates improve substantially.

This chapter is the error UX discipline.

The categories of AI error¶

Five common categories:

Category	What happens	Recovery
Timeout	The AI did not respond in budget	Retry with simpler, escalate, or background
Refusal	The AI declined (policy, safety, capability boundary)	Explain why; suggest alternatives
Empty / incomplete	The AI returned nothing useful	Suggest rephrase; offer to try again
Wrong (detected by user)	The AI returned a wrong answer the user notices	Correction (chapter 08); escalation
System error	Infrastructure failure	Apologise; suggest retry; offer escalation

Each has a different UX response.

Timeout recovery¶

The AI is slow or unresponsive. The UX:

Acknowledge the wait: "This is taking longer than expected..."
Offer alternatives: "Try a simpler question" or "Connect with support".
Offer cancellation.
For long-running tasks: move to background; notify when ready.

Avoid: generic "timeout" message with no options.

Refusal recovery¶

The AI declines. The UX:

Explain the reason clearly: "I can't help with that because [specific reason]."
Suggest alternatives where possible: "I can help with similar questions about your account, though."
Provide escalation path for legitimate needs the AI refuses.
Avoid: vague "I can't help with that" with no explanation.

Refusal categories:

Policy refusal. The AI cannot help due to platform policy.
Safety refusal. The query is outside safe answering.
Capability refusal. The AI does not have the capability or data.
Scope refusal. The query is outside the feature's scope.

Each has a slightly different framing.

Empty / incomplete response recovery¶

The AI returned nothing or an obviously incomplete response:

Surface it directly: "I couldn't generate a complete answer for that."
Suggest the user rephrase: "Try asking about a specific [...] or rephrase your question."
Offer to retry: "Try again" button.
Offer escalation.

Avoid: silent failure (the user does not know if the AI failed or is still working).

Wrong-answer recovery (user-detected)¶

The user notices the AI was wrong. The UX:

Provide a correction affordance: "This isn't right" button, edit option, "Tell us what went wrong" prompt.
Acknowledge the correction: "Thanks; we'll do better."
Use the correction for the immediate response (recompute) and for feedback (chapter 08).
Offer escalation if the correction does not resolve.

Avoid: no correction path; user has to leave the feature to get the right answer.

System error recovery¶

Infrastructure failure (the AI service is down, the gateway is failing):

Apologise specifically: "We're having trouble reaching the AI right now."
Suggest retry: "Try again in a minute" with a retry button.
Offer escalation: "Need help now? [Connect with support]"
Avoid: stack traces; error codes the user cannot interpret; silent failures.

System errors are infrequent for well-operated platforms; when they happen, the UX is the recovery.

The four properties of good error UX¶

Specific. "Something went wrong" is the failure mode. "I couldn't find your order; could you double-check the order number?" is specific.

Actionable. The user knows what to do next. Retry, rephrase, escalate, or accept.

Honest. The error is acknowledged; the AI does not pretend things are fine when they are not.

Brand-appropriate. The tone matches the product; clinical for enterprise, friendly for consumer.

Recovery paths per category¶

Error	Primary recovery	Fallback
Timeout	Retry simpler / cancel	Escalate to human
Refusal (capability)	Suggest alternative	Escalate to human
Refusal (policy)	Explain, no retry	Escalate if user disagrees
Empty / incomplete	Rephrase prompt	Retry / escalate
Wrong	Correction affordance	Escalate
System	Retry	Escalate

The escalation to human (chapter 07) is the universal fallback; it is available across error categories.

What error UX does not solve¶

The underlying error. UX presents; the cause must still be fixed.
User patience. Repeated errors exhaust users; the UX cannot make them not.
Trust loss after errors. Each error chips at trust; good UX limits the chip but does not eliminate it.

Common mistakes¶

Generic error messages. "Something went wrong" produces abandonment.

No correction affordance. Users know the AI is wrong; have no way to say.

Buried escalation. The path to human is hidden; users get frustrated before finding it.

Silent failure. The user does not know the AI failed.

Apologising endlessly. Repeated "sorry" feels insincere and produces frustration.

Interview Q&A¶

Q1. The team's AI shows "Something went wrong" on errors. Why is that the wrong default? Because it is not actionable. The user does not know what kind of error happened, what they can do, or whether to retry. Generic errors produce mechanical retries (same failure) or abandonment. Specific errors with actionable recovery — "I couldn't find your order; please check the number" — produce successful recovery in many cases. The discipline is per-category error UX, not a uniform fallback. Wrong-answer notes: "errors are exceptional" misses that error UX is core, not edge case.

Q2. Walk through the right UX for a refusal. Explain the reason specifically: "I can't help with that because [reason]." Suggest alternatives when applicable: "I can help with [related thing]." For legitimate user needs that the AI refuses, provide an escalation path: "If you need help with this, [connect with support]." The user knows why the refusal happened and what they can do next. Generic refusals ("I can't help with that") are unhelpful. Wrong-answer notes: vague refusal text damages trust.

Q3. When the user detects the AI was wrong, what should the UX offer? A correction affordance: "This isn't right" button, edit option, or "Tell us what's wrong" prompt. The correction is captured (chapter 08); used for the immediate response (recompute if possible) and as feedback for iteration. An escalation path if correction does not resolve. The user's signal is acted on, not just acknowledged. Wrong-answer notes: "let the user start over" loses the signal and produces frustration.

Q4. The team is debating whether to apologise for every error. What is your view? Apologise sincerely for errors that warrant it (system failures, the AI made a clear mistake). Skip apology for situations that aren't errors (a legitimate refusal; a clarification request). Repeated "sorry" feels insincere; users tune it out. The discipline is honest acknowledgement of what happened, with action toward recovery. The tone matches the product; the apology is proportional. Wrong-answer notes: "always apologise" produces fatigue.

What to do differently after reading this¶

Design specific error UX per category.
Make every error actionable (retry, rephrase, escalate, accept).
Provide correction affordances for wrong-answer cases.
Make escalation paths visible across error categories.
Avoid generic "something went wrong" defaults.

Bridge. Error recovery handles when things go wrong. Progressive disclosure handles how much to show when things go right. The next chapter is the discipline of revealing AI capabilities and information in stages. → 06-progressive-disclosure.md