00. AI release management — First-principles overview¶
Modules 01 and 02 of this category taught you to build evals and feedback loops. This module is the discipline of using them — together with prompts, models, and agent code as versioned artefacts — to ship AI changes safely. Canary, rollback, comms, change windows; the production engineering of releasing AI changes that touch real users.
A platform engineer at a Bengaluru SaaS company ships a prompt update at 14:00 IST on a Friday. By 16:00, customer-support tickets are spiking. The prompt change was reviewed; the eval passed; the canary was at 50%; the rollback flag was prepared but not tested. The rollback runs into a configuration issue; reverting takes 90 minutes; in that window, ten thousand customer conversations get the regressed prompt. The postmortem identifies the discipline gaps: the canary should have been at 5%, not 50%, for a change of this surface area; the rollback flag should have been rehearsed; the release should not have shipped late on Friday with reduced staffing.
This is the AI release management problem. The system is non-deterministic, the changes are often prompt or model rather than code, the blast radius is large, and the conventional software-release disciplines (canary, rollback, freeze) need adaptation. This module is the discipline.
What AI release management is¶
AI release management is the production discipline of shipping prompt, model, agent, and data changes safely — with eval gates, feedback monitoring, canary rollouts, rehearsed rollbacks, version control, and stakeholder communication.
Six surfaces.
| Surface | One-liner | Pressure it answers |
|---|---|---|
| Change types | Recognising prompt, model, agent code, eval, data as distinct change classes | targeted discipline per type |
| Release gates | Eval and feedback signals as preconditions to ship | quality: ship only what passes |
| Canary | Gradual rollout with monitoring at each step | blast control: detect before wide |
| Rollback | Tested, fast reversal when canary or production shows regression | recovery: undo cleanly |
| Versioning | Prompts, models, agents as versioned artefacts | traceability: know what shipped when |
| Communication | Stakeholders informed before, during, after | trust: surprise erodes confidence |
A seventh concern — incident response when a release goes wrong — runs across the surfaces and is its own chapter (10).
The recurring vocabulary¶
| Name | Surface | What it is |
|---|---|---|
| the change type | Change types | Prompt, model, agent code, eval, data |
| the eval gate | Release gates | The eval check that must pass before promotion |
| the feedback gate | Release gates | The feedback profile that must hold during canary |
| the canary step | Canary | A specific traffic percentage in the rollout |
| the rollback flag | Rollback | The flag that flips traffic back to the prior version |
| the change window | Communication | The agreed time period for shipping; respects freezes |
| the freeze period | Communication | A time period during which non-critical changes do not ship |
The journey¶
This module has three acts.
Act 1 — Recognise (files 01–03). Why AI releases differ; the types of changes; the gates that precede shipping.
Act 2 — Roll out (files 04–07). Canary discipline, rollback discipline, versioning, communication.
Act 3 — Operate (files 08–10). Change windows, coordinated multi-change releases, emergency changes, postmortems.
Synthesis (files 12–13). Architect checklist and honest admission.
Memory map¶
| # | File | What it adds |
|---|---|---|
| 01 | why-ai-releases-are-different | the reframe: prompts/models as releases, not just deploys |
| 02 | the-change-types | prompt, model, agent code, eval, data — distinct disciplines |
| 03 | release-gates | eval + feedback as preconditions |
| — milestone: gates exist — | ||
| 04 | canary-rollouts | traffic shifts with monitoring at each step |
| 05 | rollback-discipline | tested, fast reversal |
| 06 | versioning-prompts-and-models | the artefact discipline |
| 07 | release-communication | stakeholders before, during, after |
| — milestone: rollouts work — | ||
| 08 | change-windows-and-freezes | when shipping happens; when it does not |
| 09 | coordinated-multi-change | when several changes ship together |
| 10 | emergency-changes | the discipline for the urgent override |
| 11 | release-postmortem | when something went wrong |
| — milestone: operations are mature — | ||
| 12 | architect-checklist | 20 items |
| 13 | honest-admission | what release management cannot solve |
How this module relates to its neighbours¶
00_ai_evals_release_gates— evals as gates; this module is how the gates fit into the release flow.01_dataset_golden_set_operations— the golden set is the substrate for the eval gate.02_telemetry_feedback_loops— feedback is the post-canary signal; this module reads it.13_prompt_lifecycle_operations— prompts as versioned artefacts; this module ships them.02_ai_infrastructure/01_model_gateway_provider_ops— the gateway is the runtime that enforces routing; this module operates the route changes.05_ai_incident_operations— release incidents follow the broader AI incident discipline.
Top resources¶
- Google SRE — release engineering — https://sre.google/sre-book/release-engineering/
- Netflix — canary analysis — https://netflixtechblog.com/automated-canary-analysis-at-netflix-with-kayenta-3260bc7acc69
- Anthropic — model deprecations — https://docs.anthropic.com/en/docs/about-claude/model-deprecations
- OpenAI — production best practices — https://platform.openai.com/docs/guides/production-best-practices
What's coming¶
- 01-why-ai-releases-are-different.md — Why prompt and model changes are releases, not just deploys.
- 02-the-change-types.md — Prompt, model, agent code, eval, data; distinct disciplines.
- 03-release-gates.md — Eval and feedback gates.
- 04-canary-rollouts.md — Traffic shifts with monitoring.
- 05-rollback-discipline.md — Tested, fast reversal.
- 06-versioning-prompts-and-models.md — Artefact versioning.
- 07-release-communication.md — Stakeholders before, during, after.
- 08-change-windows-and-freezes.md — When to ship.
- 09-coordinated-multi-change.md — Several changes together.
- 10-emergency-changes.md — The urgent override.
- 11-release-postmortem.md — When it went wrong.
- 12-architect-checklist.md — Twenty items.
- 13-honest-admission.md — Limits.
Bridge. Before designing the gates or the canary, we feel why AI releases are not the same as software deploys. The first chapter is that reframe. → 01-why-ai-releases-are-different.md