12. Architect checklist¶
Twenty items. Recognise, gate, roll out, version, communicate, operate. If you can answer all with an artefact, the AI release management is defensible.
Recognise (1–3)¶
1. Change types classified. Every AI change identified as prompt, model, agent code, eval, or data; the type's discipline applied. (Chapter 02.)
2. AI vs software distinction. The team treats AI releases as distinct from pure software deploys; the four differences (data not code, non-determinism, per-call blast, pointer rollback) are understood. (Chapter 01.)
3. Material vs non-material. Each release classified for communication impact. (Chapter 07.)
Gate (4–7)¶
4. Eval gate CI-enforced. Prompt and model changes cannot merge without the eval running and passing. (Chapter 03.)
5. Feedback gate during canary. Per-canary-step feedback profile compared to baseline; promotion requires holding. (Chapter 03.)
6. Type-specific gates. Cost gate for models; schema-drift for data; integration tests for agent code. (Chapter 03.)
7. Bypass discipline. Documented bypass path with approval; bypass rate tracked. (Chapter 03.)
Roll out (8–11)¶
8. Canary by default. Every AI change canaries; exceptions are documented emergencies. (Chapter 04.)
9. Canary steps tuned. Step percentages and durations appropriate per change type and blast. (Chapter 04.)
10. Rollback tested. Rollback verified per release; periodic drills in staging. (Chapter 05.)
11. Rollback fast. Targets per change type met (e.g., < 5 min for prompts, < 10 min for models). (Chapter 05.)
Version (12–14)¶
12. Prompts in registry. Versioned per semver; previous versions retained for rollback window. (Chapter 06.)
13. Models pinned in routing policy. Versioned; previous mapping retained as fallback. (Chapter 06.)
14. Full version context captured. Parameters, eval reference, dependencies, metadata. (Chapter 06.)
Communicate (15–17)¶
15. Pre-release notice. Engineering channel always; customer-facing teams for material; customers for material. (Chapter 07.)
16. During-release status. Internal updates at canary steps; customer notice for material rollbacks. (Chapter 07.)
17. Post-release summary. Internal summary; customer notice for material releases. (Chapter 07.)
Operate (18–20)¶
18. Change window. Standard window defined; respected by default. (Chapter 08.)
19. Freeze periods. Holidays, end-of-quarter, marketing campaigns; documented. (Chapter 08.)
20. Release postmortem. Blameless postmortem for every release that went wrong; action items tracked to closure. (Chapter 11.)
How to use the checklist¶
At platform setup: walk the items; most are red on day one. Schedule the path to green.
At three months: items 1-7 (recognise and gate) are green or near-green. Items 8-11 (rollout) are operational.
At six months: items 12-14 (version) are routine. Items 15-17 (communicate) are habitual.
At twelve months: items 18-20 (operate) are mature; cross-postmortem patterns inform platform discipline.
Common postmortem-to-checklist mappings¶
- "Eval not run before merge" → item 4
- "Canary too fast" → item 9
- "Rollback took 45 minutes" → items 10, 11
- "Customer surprised by change" → items 3, 15
- "Same eval gap appeared again" → item 20 (pattern review)
- "Friday-evening incident" → items 18, 19
Interview Q&A¶
Q1. Which three items would you build first on an immature platform? Item 4 (eval gate CI-enforced) — prevents the chapter-1 regression-without-gate pattern. Item 8 (canary by default) — gradual rollout catches what gates miss. Item 10 (rollback tested) — when canary or production reveals issues, recovery is fast. These three are the operational floor; the rest builds on them. Wrong-answer notes: starting with communication (15-17) without gates and canary produces well-communicated regressions.
Q2. Most under-appreciated item? Item 5 (feedback gate during canary). Most platforms have eval gates pre-merge but skip the production feedback comparison during canary. The feedback gate catches what the eval misses; it ensures the actual user reaction holds before promotion. The discipline is operational (active monitoring during canary, not just dashboards) and often skipped. Wrong-answer notes: any item is defensible; the reasoning matters.
Q3. The team's bypass rate on eval gates is 10%. What does that suggest?
Either the gate is too strict (producing false positives that justify routine bypass) or the discipline has decayed (bypassing for convenience). Investigate: are the bypasses justified? are they patterns (same type of change)? are the false positives chapter-11 issues from 01_dataset_golden_set_operations? The 10% rate is a signal worth investigation, not just acceptance. The discipline of tracking the bypass rate (item 7) is what makes the question askable. Wrong-answer notes: "tighten the bypass policy" without diagnosing why may misdiagnose.
Q4. The platform has all 20 items green. What still goes wrong? The items raise the floor; they do not eliminate risk. Possible failures: long-tail rare cases the eval set does not cover (chapter 13's honest admission); slow-developing patterns the canary duration misses; cross-platform interactions not captured by any individual discipline; provider behaviour shifts; bugs in the discipline infrastructure itself. The discipline provides a defensible posture; the actual incidents that still occur should be smaller and recoverable. The 20 items prevent the systemic patterns; they do not promise zero incidents. Wrong-answer notes: "if all green, no incidents" is unrealistic; the discipline is about probability, not certainty.
Bridge. Twenty items. The last chapter is the honest opposite — what release management cannot solve. → 13-honest-admission.md