Skip to content

12. Architect checklist

Twenty items. Audit, stabilise, instrument, migrate, operate. If you can answer all of them with an artefact, the modernisation is defensible. If you cannot, the gaps are the work.


This is the checklist a lead uses in week one, again at day 30, again at day 60, again at day 90. Each item maps to a chapter; the question is here.


Audit (1–5)

1. Twelve-section audit. Has the day-one audit been written, shared, and revised? Are the system's purpose, inputs, outputs, prompts, models, code, observability, eval coverage, failure modes, scale, owners, and risks documented? (Chapter 02.)

2. State identified. Has the inherited system been classified as frozen, observable, eval-backed, or modular? Is the progression plan to the next state explicit? (Chapter 01.)

3. Top failure modes. Are the top 5–10 failure shapes from customer complaints catalogued with frequencies? (Chapter 02.)

4. Deprecation calendar. Are all models in production listed with their retirement dates? Is the earliest retirement scheduled for migration with margin? (Chapters 02, 08.)

5. Data-side audit. Has the data and retrieval side been audited: sources, freshness, ownership, shape guarantees? (Chapter 09.)


Stabilise and instrument (6–11)

6. Eval backstop. Is the eval set live with 50+ cases covering failure modes and happy path? Does it run on every change? Is it CI-gated against regression? (Chapter 03.)

7. Bleeding fixes scheduled. Have the top bleeding symptoms been patched on a timeline? Is each patch's structural replacement also scheduled? (Chapter 04.)

8. Prompt registry. Are all production prompts in a versioned registry, decoupled from code, with owner and intent fields? (Chapter 05.)

9. Per-call audit. Does every model call emit a structured audit record with model_used, tokens, cost, prompt version, tenant, feature, latency? (Chapter 06.)

10. Dashboards live. Are dashboards in place for per-feature latency, error rates, cost, prompt-version distribution, drift signals? Is the on-call reading them as first stop on incidents? (Chapter 06.)

11. Context capture. Does the audit capture the assembled context (sources, freshness, size, hash) per call? Does the sample store keep full context (with redaction) for review? (Chapter 09.)


Migrate (12–16)

12. Gateway-mediated model calls. Are all model calls routed through the gateway with intent-named aliases? Has every hardcoded model string been removed from product code? (Chapter 08.)

13. Model aliases pinned. Is every alias pinned to a concrete model version (never "latest")? Are migrations canary-rolled with eval gates? (Chapter 08; module 02_ai_infrastructure/01 chapter 9.)

14. First strangler boundary. Has a strangler boundary been picked, the interface defined, the modern implementation built, and shadow traffic running? (Chapter 07.)

15. Comparison and cutover discipline. Are shadow runs analysed by a comparator and cut over only on matched eval scores? Is the legacy kept alive for a stability window after cutover? (Chapter 07.)

16. Data-side modernisation track. Are upstream data sources getting freshness and shape SLAs with their owners? Is the context builder being modernised through strangler? (Chapter 09.)


Operate (17–20)

17. Stakeholder cadence. Are weekly updates flowing to PM, customer-success, on-call, and monthly to executive, on the same plan? (Chapter 10.)

18. 30-60-90 plan. Is the plan posted, updated at column boundaries, and used as the reference for every weekly update? Are slippages communicated as soon as visible? (Chapter 11.)

19. Long-tail plan. Has the post-90-day plan (months 4-9) been mapped: continuing strangler boundaries, eval coverage growth, production-traffic eval, training-data work if applicable? (Chapter 11.)

20. Incident readiness. Does the on-call have a runbook for AI-specific incidents? Have you had at least one fire drill testing the dashboards, the rollback flags, and the model fallback chain? (Chapters 06, 10.)


How to use the checklist

In week one: walk the items with the team. Most are red on day one (that is normal). Schedule the path to green.

At day 30: items 1, 2, 3, 6, 7 should be green; items 4, 5, 17, 18 should be at least yellow.

At day 60: items 8, 9, 10, 11, 13 should be green or near-green. Items 12 may be in progress.

At day 90: items 12, 14, 15, 16, 19, 20 should be green or in clear progress.

At month 6 and beyond: every item is green or in active improvement.

The checklist is a triage tool, not a binary done/not-done test. Reds are the current work; yellows are the watching work; greens are the achievements you defend with artefacts.


Common postmortem-to-checklist mappings

When something goes wrong during or after modernisation, walk the checklist with the question "which item, if green, would have prevented or shortened this?"

  • "A prompt change broke a customer use case" → item 6 (eval coverage), item 8 (registry with review)
  • "A model retirement caught us by surprise" → item 4 (deprecation calendar), item 18 (30-60-90 communication)
  • "We made a structural change and could not tell if behaviour stayed the same" → item 6 (eval), item 9 (audit)
  • "An incident took an hour to diagnose" → item 10 (dashboards)
  • "A migration produced a regression nobody noticed for two weeks" → item 6 (eval coverage), item 15 (comparator)
  • "A customer asked when modernisation will be done; we didn't have an answer" → item 18 (plan), item 17 (cadence)

When the checklist is overkill

Two cases.

Small system, single product, low stakes. A system serving an internal tool with no customer exposure can launch with items 1, 6, 8, 9, 12 green and the rest tracked as yellow. The discipline is to know what is missing, not to demand all twenty.

Brand-new system being designed. This checklist is for inherited systems. A greenfield project does not "inherit"; it builds. The architecture checklists in modules 01 and 02_ai_infrastructure/01 apply instead.

The exceptions are explicit. Quietly skipping items without reasoning is the failure mode.


Interview Q&A

Q1. You inherit a system. Walk the first three items you would address. Item 1 — write the audit. Without it, every other item is guesswork. Item 6 — build the eval backstop. Without it, no change is verifiable. Item 4 — check the deprecation calendar. The retirement deadlines force scheduling decisions that affect every other item. These three are days-of-work, not weeks; they unlock the rest. Wrong-answer notes: starting with refactoring or rewrites without the audit and eval is the chapter-1 anti-pattern.

Q2. The team argues item 6 (eval backstop) is "nice to have, not blocking." How do you respond? The eval is the line between frozen and eval-backed — the precondition for every change after. Without it, the modernisation is gambling: changes ship without verification; regressions are detected by customers; trust degrades. The eval at 50 cases takes a week to build; the alternative is multi-quarter modernisation without a safety net. The cost-benefit is overwhelming. The argument against is usually "we don't have time" — the response is "we don't have time not to." Wrong-answer notes: agreeing to defer the eval produces the slow-motion failure mode.

Q3. At day 90 you have items 1-11 green, items 12-13 in progress, items 14-16 not started. Is the modernisation on track? Yes, by the standard of "sustainable modernisation, not modernisation complete." The first eleven items are the stabilise-and-instrument phase; they being green means the system is operable, observable, and eval-backed. Items 12-13 (gateway migration) being in progress is on track for completion by day 105 or so. Items 14-16 (strangler, comparator, data side) are the long-tail work mapped for months 4-9. The 30-60-90 plan at day 90 is intentionally not the end; it is the transition to sustainable cadence. Wrong-answer notes: "off track because items 14-16 are not green" misreads the day-90 outcome as completion.

Q4. The audit revealed a stop-the-bleeding patch shipped six months ago that is still in place. The structural replacement was never built. Which item catches this? Item 7 — "Are bleeding fixes' structural replacements also scheduled?" The original patch was a stop-the-bleeding fix (chapter 04); the structural follow-up was supposed to retire it. The patch is still in place because the follow-up was never scheduled. The remediation: identify the patch, schedule the structural fix now, retire the patch when the structural fix lands. The checklist catches the gap; the schedule closes it. Wrong-answer notes: ignoring legacy patches is what produces the accumulating-debt failure mode the modernisation is meant to solve.


Bridge. The checklist is the engineer's defence. The last chapter is the honest opposite — what modernisation cannot fix, where the discipline is young, and the limits a thoughtful lead should be transparent about. → 13-honest-admission.md