Skip to content

01. Why AI releases are different

Before designing the gates or the canary, we feel why AI releases are not the same as software deploys. The reframe is the load-bearing piece; the rest of the module follows.


A platform engineer at a Pune SaaS company treats a prompt change like a code change. The PR is reviewed; CI passes the unit tests; the change is merged; the next deploy ships it to 100%. Three hours later customer-impact metrics drop. The team rolls back, but the rollback is a code rollback — the prompt revision is in source; reverting requires a deploy. The window between detection and rollback is 90 minutes. The team's discipline is good for software; it is wrong for AI artefacts whose behaviour is non-deterministic and whose blast is per-call across many users.

This chapter is the reframe. Four properties of AI releases differ from software releases in ways that matter for the discipline.


The four differences

1. The artefact is data, not code. A prompt is text. A model is a versioned reference. A retrieval index is data. Each "release" changes data the system uses, not the system's compiled behaviour. The discipline of "ship the code; the deploy is the release" misses that data-as-artefact has its own lifecycle.

2. The behaviour is non-deterministic. Two calls to the same prompt + model produce different outputs. A change's effect is a distribution shift, not a binary on/off. Detecting the effect requires statistical signal — eval scores across a set, feedback rates across a population. The "deploy and watch" of software releases is insufficient.

3. The blast is per-call across users. A prompt change affects every conversation from the moment it ships. Unlike a code path that may be exercised by a fraction of users, the prompt is used on every call. The user-facing impact is immediate and broad.

4. The rollback is not always a deploy. For code, rollback means redeploying the previous version. For a prompt or model managed via a registry or routing policy, rollback is flipping a routing weight or a registry pointer — faster, but requires the registry/routing infrastructure to support it.

Together, the four properties demand release discipline tuned for AI: gates that measure distributions, canaries that observe per-call effects, rollbacks that flip pointers fast, communication that recognises per-call user impact.


What this means for the discipline

Each chapter of this module addresses one or more of the four differences:

  • Chapter 02 (change types) — recognising the data-as-artefact difference; different changes need different disciplines.
  • Chapter 03 (release gates) — eval as the statistical gate; feedback as the user-perception gate.
  • Chapter 04 (canary) — gradual rollout to observe distributional effects on subsets.
  • Chapter 05 (rollback) — fast flip rather than redeploy.
  • Chapter 06 (versioning) — prompt and model as first-class versioned artefacts.
  • Chapters 07-10 — operational disciplines (communication, windows, coordinated changes, emergencies).
  • Chapter 11 (postmortem) — when something went wrong.

The module is the AI-specific layer on top of the broader software release discipline; the broader discipline still applies (CI, code review, deploy hygiene), with AI-specific additions.


What stays the same as software releases

  • Code review. Prompt changes are reviewed by humans before shipping.
  • CI. Schema checks, well-formedness, version bump rules all apply to prompt/model registries.
  • Audit. Every release is logged with what shipped when.
  • Rollback as a first-class concern. Rollback is rehearsed and fast.
  • Postmortem culture. Incidents are blameless and produce systemic improvement.

The discipline borrows from software; it adapts for AI-specific properties.


What is genuinely new

  • Statistical gates. Eval scores and feedback rates as distributions, not binary tests.
  • Gradual canary based on quality signals. Not just "no errors"; the quality must hold.
  • Pointer-based rollback. Registry/routing weight changes, not redeploys.
  • Cross-artefact coordination. A prompt change may require a model change; chapter 09 covers this.

These are the AI-specific additions the module builds.


What the chapter-opening incident reveals

Returning to the prompt change shipped to 100% and rolled back via code:

  • The change was statistical (a prompt revision); the gate was binary (CI passed).
  • The rollout was all-at-once (100%); the canary discipline was bypassed.
  • The rollback was a deploy (slow); the pointer-based rollback was not in place.
  • The communication was internal (the team noticed via metrics); the user notification was reactive.

Each chapter of this module addresses one of these gaps.


Common mistakes from the misframe

Treating prompt changes as code changes. No statistical gates; binary CI; deploy-and-watch.

Treating model migrations as configuration changes. No canary; weight flipped from 0 to 100; no feedback monitoring.

Treating data updates as out-of-scope for release management. Retrieval-index updates can change agent behaviour materially; treating them as data plumbing misses the release-discipline need.


Interview Q&A

Q1. What is the single biggest difference between an AI release and a software release? The artefact is data, not code; the behaviour is non-deterministic; the blast is per-call across users; the rollback is a pointer flip, not a redeploy. The combination means the discipline needs statistical gates (not binary CI), gradual canaries (not all-at-once deploys), fast pointer rollback (not redeploy rollback), and per-call user-impact awareness. Each individually matters; the combination requires the AI-specific layer of release management. Wrong-answer notes: "it uses AI" is surface; the methodological consequences are the substance.

Q2. Walk through the chapter-opening incident through the four differences. The change was statistical (a prompt revision; effect is distributional). The gate was binary (CI passed); the statistical gate (eval against the regression set) was either skipped or insufficient. The rollout was all-at-once (100%); the canary was bypassed. The rollback was a code deploy (slow); the pointer-based rollback through a prompt registry was not in place. The user impact was per-call (every conversation affected immediately); the team noticed reactively, not proactively. Each of the four AI-release-difference axes was mishandled. Wrong-answer notes: "the team didn't test enough" is the surface diagnosis; the four-axes framing reveals the systemic gaps.

Q3. The team's discipline for software releases is mature. What is the gap when they extend it to AI? The software discipline assumes the artefact is code, the behaviour is deterministic, the blast is per-code-path (often a fraction of users), the rollback is a redeploy. Extending to AI without adaptation produces all-at-once deploys of prompts, deploy-based rollbacks, no statistical eval gates, no feedback monitoring during canary. The gap is not that the team is undisciplined; it is that the discipline needs the AI-specific additions of this module. Wrong-answer notes: "their discipline isn't good enough" misreads — the discipline is fine for what it was built for; the extension to AI is the work.

Q4. What kind of "change" should be governed by AI release management? Prompt changes, model migrations, agent code changes that affect AI behaviour, eval changes that affect the gate's meaning, data changes that affect what the agent retrieves (e.g., a new retrieval index). Each is an AI release. The bar: if it changes what the user experiences from the AI, it is an AI release. Pure infrastructure changes (a database optimisation, a network reroute) are software releases that may not need the AI-specific layer. Wrong-answer notes: "only prompt changes" misses model and data; "everything" misses the distinction with pure infrastructure.


What to do differently after reading this

  • Stop treating prompt and model changes as "configuration changes."
  • Recognise the four differences explicitly; design discipline for each.
  • Borrow from software release engineering; do not assume it covers AI.
  • Identify which changes are AI releases and apply the discipline.

Bridge. With the reframe in place, the next move is to recognise that AI releases come in several types, each with its own discipline. The next chapter classifies the change types. → 02-the-change-types.md