Skip to content

08. Model and version stabilisation

The strangler pattern replaces components. One specific kind of component that almost always needs early attention in an inherited AI system is the model selection itself — hardcoded model strings, no versioning discipline, exposure to vendor drift. This chapter is the migration that turns that mess into a managed, calendar-driven discipline.


An engineer at a Bengaluru fintech opens the inherited code and runs grep -r 'claude-' src/. Eleven hits. Three different concrete model versions, two of which are within ninety days of retirement. The model strings are embedded in model= parameters across nine files. There is no central place to change them. The audit (chapter 02) had recorded this; the lead's plan (chapter 04) had it scheduled for week five. Today is week five.

The work is straightforward. Replace each hardcoded string with a call into the gateway with a model alias (smart-reasoner, fast-summariser). Pin each alias to a concrete model version in the gateway's routing policy. Run the eval before and after to confirm behaviour holds. Track the alias-to-version mapping in the deprecation calendar.

The whole thing takes ten days. It transforms the system from "every vendor change is a fire" to "every vendor change is a calendar item."


What stabilisation means here

Three concrete changes to the inherited system.

1. Remove hardcoded model identifiers from product code. Replace every model="claude-..." and equivalent with a call into the model gateway (module 02_ai_infrastructure/01) using an intent-named alias.

2. Pin every alias to a concrete model version in the gateway. No "latest" aliases. The alias-to-version mapping is owned by the platform team.

3. Track retirement dates centrally. Every model version in production is on the deprecation calendar. Migrations are scheduled before the retirement date with the eval backstop and canary discipline of chapter 09 of 02_ai_infrastructure/01.

Three concrete capabilities the system gains as a result.

  • Migrations are calendar-driven, not emergency-driven.
  • A model version change does not require a code deploy of every product that uses it.
  • Cross-product visibility — the platform team can answer "which products are still on the retiring model" in seconds, not weeks.

Where stabilisation sits in the modernisation sequence

This chapter assumes:

  • The eval backstop (chapter 03) exists, so each model migration can be verified.
  • Observability (chapter 06) is live, so the audit log records which model was actually used per call — necessary to confirm the migration has rolled out.
  • The gateway (module 02_ai_infrastructure/01) exists in some form. If it does not, this chapter doubles as the case to build the minimum gateway as part of the modernisation. A thin gateway with just routing and audit is enough to start.

If the gateway does not yet exist, the lead has a choice: build a thin gateway as part of this work, or do this work as a small wrapper layer that introduces the alias indirection in-process. The thin gateway is the better trajectory because it scales to the rest of the platform; the in-process wrapper is a faster start that you may need to retrofit later.


The migration steps

For one alias (say, smart-reasoner), six steps.

Step 1 — Inventory model call sites

The audit produced a partial list. Confirm it with another grep and a code-walking pass:

src/agents/reviewer.py:88           model="claude-3-opus-20240229"
src/agents/reviewer.py:142          model="claude-3-opus-20240229"
src/agents/summariser.py:34         model="claude-3-haiku-20240307"
src/handlers/explain.py:60          model="claude-3-opus-20240229"
src/handlers/extract.py:21          model="claude-3-sonnet-20240229"

Five call sites; three different model versions. Group by intent: the reviewer, explain, and (likely) extract handlers all need the smart-reasoner alias; the summariser needs the fast-summariser alias.

Step 2 — Choose aliases and pin versions

Declare the aliases. For this inherited system:

aliases:
  smart-reasoner:
    candidates:
      - id: "anthropic:claude-sonnet-4-6:ap-south-1"
        weight: 100
  fast-summariser:
    candidates:
      - id: "anthropic:claude-haiku-4-5:ap-south-1"
        weight: 100

The choice of which concrete version to pin can be:

  • The version production already runs, if it is current — straight migration, no behaviour change.
  • A newer version, if production is on a retiring model — combined migration that does behaviour-change work too.

The first is safer; the second is sometimes forced by a tight retirement deadline. If both, do the alias migration first (no behaviour change), then the version migration as a separate step (chapter 09 of 02_ai_infrastructure/01 provides the canary pattern).

Step 3 — Wire the gateway call

Change each call site to call the gateway:

# before:
response = anthropic.Client().messages.create(
    model="claude-3-opus-20240229",
    messages=[...],
    max_tokens=1000,
    temperature=0.2,
)

# after:
response = gateway.call(
    model_alias="smart-reasoner",
    messages=[...],
    max_output_tokens=1000,
    temperature=0.2,
    tenant_id=request.tenant_id,
    feature_id="reviewer",
)

The gateway resolves the alias to the pinned concrete model, calls the provider, records the audit, and returns the response in the unified shape (module 02_ai_infrastructure/01 chapter 02).

Step 4 — Feature-flag the rollout

For each call site, the migration is behind a flag at 1% → 10% → 50% → 100%. The eval (chapter 03) runs at each stage; the audit (chapter 06) shows which model handled each call. A regression is rolled back by flipping the flag.

The flag is per-call-site so a problem in one site does not require rolling back the others.

Step 5 — Verify with the eval and the audit

After 100% rollout, two checks:

  • Eval: scores match the pre-migration baseline within tolerance.
  • Audit: every call to this surface now records the gateway as the call site; no calls bypass the gateway.

The audit check is the proof that the migration is complete. A persistent bypass means the migration is partial; investigate which path is still using the old code.

Step 6 — Delete the old code

The direct SDK calls and the hardcoded model strings are removed. The provider SDK import may be removable from product code entirely; the gateway handles it.

This step matters. A team that leaves the old code in place "for fallback" creates two enforcement paths that will diverge. The fallback is a gateway feature (fallback chain in module 02_ai_infrastructure/01 chapter 4), not a code feature.


What about parameter drift

A frequent surprise: the inherited code calls the model with parameters that are not what the team thought. temperature=0.7 when everyone believed it was 0.2. max_tokens=4096 when the prompt only ever produces 600. A stop_sequences value that nobody can explain.

Capture the parameters as they are during step 3. Migrate first; rationalise later. The parameters are part of the prompt's effective behaviour; changing them during the migration introduces a behaviour change disguised as a refactor.

Once migrated, the registry (chapter 05) holds the parameters. Adjustments to them go through the prompt-versioning discipline (module 13) with eval gating.


The deprecation calendar

The single artefact that turns model retirement from surprise to scheduled work.

deprecations:
  - provider: anthropic
    model: claude-3-opus-20240229
    retirement_date: 2026-07-15
    successor: claude-sonnet-4-6
    migration_owner: platform-team
    affected_aliases: [smart-reasoner]
    status: in-progress
    notes: |
      Canary at 25%; eval within tolerance; promote to 100% by 2026-07-01.

  - provider: anthropic
    model: claude-3-haiku-20240307
    retirement_date: 2026-09-01
    successor: claude-haiku-4-5
    migration_owner: platform-team
    affected_aliases: [fast-summariser]
    status: planned

A weekly review of the calendar surfaces approaching deadlines. The lead's job is to keep the migrations ahead of retirement, not behind. Module 02_ai_infrastructure/01 chapter 9 builds the canary discipline for each migration.


How this interacts with the rest of the modernisation

  • Eval backstop (chapter 03) — each model change is gated by the eval.
  • Prompts in registry (chapter 05) — the prompt and the model are now both managed artefacts; changes to either go through their respective workflows.
  • Observability (chapter 06) — the audit confirms migration completeness and surfaces drift after migration.
  • Strangler (chapter 07) — model calls can be a strangler boundary, with shadow traffic comparing legacy direct calls against gateway-mediated calls.
  • 30-60-90 plan (chapter 11) — model migrations get explicit milestones with deadlines.

Common mistakes

Migrating the alias and the model version in one step. Two changes at once is two sources of regression. Do the alias migration first (no behaviour change); then the version migration as a separate, evaluated step.

Skipping the audit verification. "We changed the code, so it must be using the gateway now." Without the audit, you cannot prove every call is going through the gateway; bypass paths quietly accumulate.

Leaving the old code as fallback. Two paths means two behaviours. The fallback is a gateway feature.

Treating "latest" as good enough. A provider's "latest" alias is the chapter-9-failure of 02_ai_infrastructure/01. Pin concretely.

No calendar. Without the deprecation calendar, retirements catch you. The calendar is the single artefact that makes the discipline operational.


Interview Q&A

Q1. The inherited code has eleven hardcoded model strings. How do you stabilise? Inventory the call sites by grep. Group by intent into aliases (smart-reasoner, fast-summariser, etc.). Set up the gateway (or thin gateway) with the alias-to-version mapping. Migrate each call site to a gateway call, behind a feature flag at 1% → 100%. Verify with eval and audit at each stage. Delete the hardcoded strings. Add every model version to the deprecation calendar. The whole effort is one to two weeks for a moderate system. Wrong-answer notes: "find-and-replace" without flags and verification is the path to a regression.

Q2. The team has not built the model gateway yet. The model retirement is in 60 days. What do you do? Build a thin gateway in the first three weeks as part of this modernisation work. The gateway can be minimal: routing by alias, audit emission, one provider. That is enough to do the model migration and to gain the cross-product visibility for future work. Building it specifically for this modernisation has a side benefit: when the platform later needs a full gateway, the foundation is laid. Alternatively, an in-process wrapper that introduces the alias indirection is faster but does not scale to other products; the gateway is the better trajectory if you have three weeks. Wrong-answer notes: "we'll migrate first then build the gateway" — without the gateway, every migration is direct code change in every call site, and the next retirement is a fire too.

Q3. The current production model is claude-3-opus-20240229 and is retiring in 60 days. The team wants to migrate directly to claude-sonnet-4-6. Is that wise? The migration is two changes: from direct call to gateway call, and from one model to another. Two changes can produce regressions whose source is hard to attribute. Better: first migrate to the gateway with the alias pinned to the same current model. That is no behaviour change; the eval verifies. Then in a separate step, change the alias's pinned model to claude-sonnet-4-6 with canary, evals, and proper migration discipline. Two steps; two narrow risks. Wrong-answer notes: "ship both at once to save time" produces a debugging tangle when something regresses.

Q4. How do you confirm a model migration is complete? The audit. After flipping the flag to 100%, query the audit for the affected feature/handler: every call should record the gateway as the path, and the resolved model should be the new alias's pin. Any calls still showing the direct SDK path or the old model are the migration's tail. Investigate them; they are usually a code path you missed in the grep or a deploy not yet rolled out. The audit is the truth; the code review is the proxy. Wrong-answer notes: "we deployed the change" is the proxy; "the audit confirms 100% of calls use the gateway" is the truth.


What to do differently after reading this

  • Inventory every hardcoded model string in the inherited system. The audit produced a partial list; complete it.
  • Group call sites by intent into aliases. Pin each alias to the current production model first; migrate the version separately.
  • Migrate behind feature flags with eval verification at each stage.
  • Use the audit to confirm migration completeness, not the deploy log.
  • Stand up the deprecation calendar. Review weekly.

Bridge. Models are stabilised; the system is on a calendar. The next part of the inherited mess is the data side — retrieval contexts, training data references, context-building heuristics. Many inherited AI systems carry as much data debt as code debt, and the modernisation has to address it. The next chapter is the data-pipeline and context modernisation. → 09-data-pipeline-and-context-debt.md