13. Honest admission¶
Twelve chapters of discipline. None of them solve the problem entirely. This chapter lists the limits a thoughtful lead is transparent about — with their team, their stakeholders, and themselves.
The release discipline raises the floor. It does not eliminate the possibility of bad releases. The honest admissions below are the gaps where the discipline is bounded.
1 — Gates measure samples, not populations¶
The eval gate measures the change against the eval set; the set is a sample of inputs. The change may regress on cases not in the set. The feedback gate measures the canary's response; the canary is a sample of users. Both gates raise the floor; both have known coverage limits.
2 — Canary catches the visible, not the slow-developing¶
A canary at 5% for 24 hours catches issues that emerge within that window. Slow-developing patterns (user reactions over weeks; cumulative drift) may not surface in canary. Post-promotion monitoring catches some; the long tail still escapes.
3 — Rollback restores, but does not undo customer impact¶
The rollback restores the system to the prior version. The harm during the regression window (the conversations that received the bad responses, the actions taken on bad outputs) is not undone. The faster the rollback, the smaller the harm; "harmless rollback" is not achievable.
4 — Version retention has a window¶
Previous versions are retained for the rollback window (typically 30-90 days). Older versions are archived; rollback to them requires more work. A regression discovered six months later — when a different change interacts with a baseline behaviour — cannot rollback to the original; the discipline is to roll forward.
5 — Communication respects audiences differently¶
Engineering channels get fast updates; customer-facing teams get translated summaries; customers get material-only notices. Each audience has different needs. The discipline can do all three well; it cannot ensure every customer reads every notice. Some customers will be surprised regardless.
6 — Change windows produce backlog¶
A narrow window produces release backlog when the team's velocity exceeds the window's capacity. Wider windows reduce backlog but increase off-hour risk. The trade is operational; no window is universally right.
7 — Emergency bypass has cost¶
Even disciplined emergency bypass produces risk: skipped gates, compressed canary, potential regressions discovered post-promotion. The discipline reduces the risk; it does not eliminate. A platform with frequent emergencies has compounding risk from frequent bypasses.
8 — Coordinated releases amplify risk¶
Multi-change coordinated releases concentrate risk; multiple changes in flight at once. Feature flags help; the underlying risk is real. The discipline of sequenced releases reduces but does not always satisfy the user-facing coherence need.
9 — Postmortems improve the discipline; they do not prevent recurrence of every kind¶
A new failure pattern can occur that no prior postmortem addressed. The discipline catches the patterns the team has seen; novel patterns appear and require new action. The postmortem culture is necessary; it is not sufficient.
10 — Cross-team release coordination is people work¶
The technical discipline can be perfect; if cross-team coordination is informal, multi-change releases will encounter coordination failures. The discipline includes the documentation and the explicit ownership; the actual coordination is human work.
What this module does not teach¶
- The specifics of feature flag systems (LaunchDarkly, etc.)
- The internals of routing policies (covered in
02_ai_infrastructure/01) - The eval discipline in depth (covered in
00_ai_evals_release_gates,01_dataset_golden_set_operations) - The feedback capture in depth (covered in
02_telemetry_feedback_loops) - General software release engineering (covered in industry references)
This module is the AI-specific layer; the neighbours fill in around it.
How to use this module after reading it¶
- Audit against chapter 12. Identify the top three reds.
- Eval gate CI-enforced (item 4) first. Most leverage.
- Canary by default (item 8) second. Catches what gates miss.
- Rollback tested (item 10) third. Recovery when canary or production reveals issues.
- Build the rest over months. Communication, change windows, postmortem discipline as the platform matures.
- Re-read this honest admission quarterly.
Closing¶
AI release management is the production engineering of shipping AI changes safely. The discipline this module taught — recognise change types, gate eval and feedback, canary gradually, rollback fast, version artefacts, communicate honestly, operate with windows and freezes, postmortem when things go wrong — gives the team a defensible release posture.
It is not a guarantee. It is the most concrete discipline the team has of "ship what passes; canary what we're not sure of; rollback what fails; learn from what surprised us."
That is what production-grade AI release management looks like.
Bridge. This module covered AI release management. The next module,
18_human_ai_product_experience, is the UX side — how the AI's behaviour reaches users, what UX patterns make AI products trustworthy, and the human factors of working with non-deterministic systems. → ../../01_ai_engineering/18_human_ai_product_experience/00-eli5.md