12. Architect checklist¶
Twenty items. Design, build, launch, operate. If you can answer all of them with a gateway artefact, the platform is defensible. If you cannot, the gaps are the work.
This is the checklist a tech lead uses in design review, at launch, and at the first incident postmortem. Each item maps to one or more of the preceding chapters; the rationale is in those chapters, the question is here.
Design (1–6)¶
1. Boundary. Is there exactly one model gateway that every caller passes through, owned by a platform team, deployed as a service? (Chapters 01, 02.)
2. Aliases. Are intent-named aliases (fast-summariser, smart-reasoner) the only model identifiers callers see? Are concrete model versions pinned and never the provider's "latest"? (Chapters 03, 09.)
3. Routing policy. Is routing data, not code — readable in twenty seconds per alias? Does the route key include workload class, latency budget, privacy zone, and cost ceiling? (Chapter 03.)
4. Fallback chains. Is the fallback chain defined per alias with explicit degrade-vs-refuse policy? Is the worst-case latency of each step accounted for in budget math? (Chapter 04.)
5. Privacy zones. Is the privacy zone a property of the tenant (not the request)? Does the routing plane filter candidates by zone before any other policy? (Chapter 10.)
6. Cost ceilings. Does every alias have a per-call cost ceiling enforced at routing, and every tenant a per-period budget enforced before call? (Chapter 07.)
Build (7–13)¶
7. Unified request shape. Is the internal request shape decoupled from any provider's native API, with provider-specific transform layers? (Chapter 02.)
8. Quota plane. Are token buckets per-tenant, per-alias, per-feature operated against a centralised store? Is the per-provider bucket capped below the provider's published limit? (Chapter 05.)
9. Credential isolation. Do provider keys exist only in the gateway's vault? Do callers hold gateway-issued credentials with explicit scope (aliases, tenants, environment)? (Chapter 06.)
10. Cost attribution. Is cost_usd and price_book_version stamped on every audit record? Is per-tenant/per-feature/per-agent attribution real-time? (Chapter 07.)
11. Caching. Is exact-match caching implemented with cache key including the concrete model version? Are error responses excluded from the cache? (Chapter 08.)
12. Audit emission. Does the gateway emit one structured audit record per call with all the fields from chapter 11 of module 19 plus model-specific fields (alias, model_used, usage, cost, cache_status, fallback_step, region)? (Chapter 11 of this module.)
13. Observability dashboards. Per-provider, per-region, per-alias, per-tenant dashboards, with baselines and alarms? (Chapter 11.)
Launch (14–17)¶
14. Tier-zero discipline. Is the gateway deployed multi-AZ at minimum, with regional deployment for relevant zones? Does it have its own SLO and on-call rotation? (Chapter 02.)
15. Pact tests. Are pact tests against each provider running on a schedule? Do they exercise happy path, error codes, idempotency, and tenant isolation? (Chapter 09.)
16. Migration playbook. Is the canary rollout flow documented and rehearsed? Can a new model version be promoted from 0% to 100% without product code changes? (Chapter 09.)
17. Compliance integrity. Are zone violations alarmed (page) at zero tolerance? Is the standing query for any regional-strict tenant proven to return zero rows? (Chapter 10.)
Operate (18–20)¶
18. Drift review. Is the drift panel reviewed weekly: eval scores, output-token shifts, refusal rates, UPSTREAM_UNCLASSIFIED rates, postcondition violations? (Chapters 09, 11.)
19. Deprecation calendar. Is every provider's retirement schedule tracked centrally, with migrations completed at least 7 days before retirement? (Chapter 09.)
20. Reconciliation. Is monthly cost reconciliation against provider invoices passing within tolerance? Is the price book updated promptly when providers change prices? (Chapter 07.)
How to use the checklist¶
In a design review: walk the twenty items, mark green/yellow/red, focus on reds first. In a launch review: same walk, plus prove items 14–17 with the actual artefacts. In a postmortem: which item, if green, would have prevented or shortened the incident?
Common postmortem-to-checklist mappings:
- "The provider went down and the chatbot died" → item 4 (fallback chains)
- "We didn't notice the cost spike until the bill" → items 10, 13 (cost attribution and dashboards)
- "A leaked key produced a six-figure charge" → item 9 (credential isolation)
- "We promised in-region processing but cannot prove it" → items 5, 17 (privacy zones, compliance integrity)
- "A model retirement caught us by surprise" → item 19 (deprecation calendar)
- "A new provider rate limit caused 429s" → items 8, 18 (quota plane, drift review)
- "The platform was up but quality dropped silently" → item 18 (drift review including eval scores)
- "An incident took an hour to diagnose" → item 13 (per-provider dashboards)
- "Each product team had to migrate separately" → items 1, 2 (boundary, aliases)
When the checklist is overkill¶
Two cases:
- Single-product pre-launch startup. A founding team with one product and one provider can launch with items 1, 2, 3, 7, 9, 10, 12 green and items 4, 6, 8, 11, 13–20 as yellow with a documented backlog. The discipline is that callers go through something the team can extend; not that every surface is built.
- Pure-research environment. Internal-only, no tenants, no compliance, no cost concerns — items 5, 6, 10, 17, 20 may be N/A. Document the N/A explicitly so they are not silently skipped.
The exceptions are explicit. Quietly dropping items without rationale is the failure mode.
Interview Q&A¶
Q1. Which three items on this checklist are most often missing in early gateways? Item 9 (credential isolation — products often hold provider keys), item 10 (cost attribution — implicit reliance on provider invoices), item 13 (observability dashboards — debugging instead of dashboards is the default early on). These three are the highest-leverage early-stage fixes. Wrong-answer notes: answers without specificity miss the calibration.
Q2. The team argues the checklist is overkill for a small platform with one tenant and one provider. What is your response? A subset is appropriate. The design items (1–6) apply regardless — they cost little to do right and a lot to retrofit. Items 14–17 (launch) may be reduced; items 18–20 (operate) become relevant as soon as the platform serves real users. The checklist is a triage tool, not a uniform demand. A small platform with items 1, 2, 3, 7, 9, 10, 12 green is in better shape than a large platform with all twenty items "in progress." Wrong-answer notes: abandoning the checklist entirely produces the inherited mess of item-2 audits.
Q3. What item on this checklist is most under-appreciated? Item 18 (drift review). Silent provider behaviour shifts are the most insidious failure mode because they produce slow quality degradation rather than visible outages. A platform that has all other items green but does not review drift weekly will accumulate quality regressions until something catastrophic forces investigation. The weekly review is cheap and high-value. Wrong-answer notes: any specific item is defensible; what distinguishes is the silent-compounding reasoning.
Q4. You inherit a gateway that has been running for 18 months. Where do you start the audit? Walk the twenty items in order. Items 1 and 2 first — boundary and aliases are typically the most drifted in legacy gateways. Items 9 and 10 next — credentials and cost attribution are the second tier. Items 14–17 (launch discipline) are next: do the regional deployment, SLO, and pact tests actually exist? Items 18–20 last — operational discipline is hardest to inherit. The audit produces a triaged list of reds and yellows, which becomes the work. Wrong-answer notes: "rewrite the whole gateway" is not an audit; the audit is what tells you what to rewrite.
Bridge. The checklist is the engineer's defence. The last chapter is the honest opposite — what model gateways still cannot solve, where the discipline is young, and where a thoughtful lead should be transparent about the limits. → 13-honest-admission.md