Skip to content

00. Legacy AI modernization — First-principles overview

Every module before this one taught you to build AI systems well. This module is the discipline of inheriting one that was not built well — and fixing it without taking it down.


A new tech lead joins a mid-sized fintech in Bangalore. On day three, her manager hands her an AI-powered claims-triage system the previous team shipped two years ago. The previous team has dispersed. The Confluence pages are eighteen months old. The system handles three thousand claims per day with a 7% customer-complaint rate that has been climbing slowly. There are no evals. The prompts are buried inside a 2,400-line Python file. The model is hardcoded as a string in three different places. There are no traces. The CI suite is forty unit tests that mock the model. Operations have an unspoken rule: do not touch the AI module — it works, and nobody knows how. Her instructions from her manager are: "reduce complaints, ship improvements monthly, do not break what works."

This is the most common applied-lead scenario in 2026. Not "design a new AI feature" — "the previous team shipped this; fix it." The interview question and the production reality are the same question. This module is the playbook.


Why legacy AI is its own discipline

Legacy code is a well-studied problem. The patterns — characterisation tests, strangler fig, branch by abstraction, parallel run — are decades old. They mostly apply here. But four things make legacy AI different in kind, not just degree.

Legacy AI is different because Implication
The system is non-deterministic. The same input does not produce the same output. Behavioural changes are hard to detect. "Did my change make it better?" cannot be answered without an eval.
The behaviour is driven by prompts and model versions, not by code paths The most impactful changes are in artefacts (prompts, configs, models) that traditional refactoring tools do not see.
The vendor changes the model under you. Even a "do nothing" stance is unstable. The system drifts even when you do not touch it.
The cost of wrong behaviour is often higher than wrong code Production-shaped wrong outputs to users can damage trust permanently. Conservative discipline matters more.

The corollary: standard legacy-code playbooks need an AI-specific safety layer added on top. That layer is the eval-shaped backstop (chapter 03) and the structured observability (chapter 06). Without them, you are refactoring a system you cannot prove you have not broken.


What this module teaches, in one sentence

You audit, stabilise, instrument, and replace an inherited AI system one piece at a time, with an eval backstop that catches regression, on a rhythm that ships improvement while delivering the existing service uninterrupted.

Read it left to right.

  • Audit — chapter 02. Day-one mapping of what you actually have.
  • Stabilise — chapter 04. The "stop the bleeding" decisions.
  • Instrument — chapter 06. Observability retrofitted.
  • Replace one piece at a time — chapter 07. Strangler migration.
  • Eval backstop — chapter 03. The safety net before any change.
  • Ship improvement while delivering — chapter 11. The 30-60-90 plan.

Every chapter is one of these phases or the practical artefact you produce.


The four states of an inherited AI system

Memorise this once. Every inherited system is in one of four states. Your first job — chapter 02 — is to put a label on the one you have.

State What you can do What you cannot do
Frozen Operate the system safely; trust its behaviour to the extent it has been validated historically Ship changes — you have no signal whether they help or hurt
Observable Operate; investigate incidents; characterise behaviour from traces Ship changes — observability alone does not catch regressions
Eval-backed Operate; investigate; ship changes against a backstop Aggressively rewrite — the eval may not cover what you are about to change
Modular Replace components without touching others (this is the destination, not a state to inherit)

The progression is the work: frozen → observable → eval-backed → modular. Most inherited systems are frozen; some are observable; almost none are eval-backed; very few are modular. The chapters of this module are the path along that progression.


The recurring vocabulary

These terms appear in every chapter.

Name What it is
the inherited system The AI system you did not build that you are now responsible for
the day-one audit The first systematic mapping of what is in the system, its inputs, outputs, and operational shape
the eval backstop The minimum set of evals that lets you ship changes without breaking what works
the stop-the-bleeding fix The smallest change that addresses the worst symptom, taken before the larger remediation
the strangler boundary The interface inside the system across which old and new implementations coexist
the prompt registry The owned, versioned location for prompts, separate from product code
the parallel run Old and new implementations both serving traffic, with outputs compared
the regression eval The eval that captures behaviour the team has decided must not change
the 30/60/90 plan The sequenced commitments the new lead makes for the first three months

The journey

This module has three acts.

Act 1 — Understand what you have (files 01–03). Read the system through three lenses: what makes it different from legacy code, what to audit on day one, and what eval backstop you need before doing anything else.

Act 2 — Stabilise and instrument (files 04–06). Decide what to fix first, separate prompts from code, retrofit observability. By the end of act 2 the system is operable; you can run it safely while you plan.

Act 3 — Migrate and operate (files 07–11). Strangler migration to a new architecture, model and version stabilisation, data-pipeline modernisation, stakeholder management, and the 30/60/90 rhythm.

Synthesis (files 12–13). Architect checklist and honest admission of what migration cannot solve.


Memory map

# File What it adds
01 the-inherited-system What makes legacy AI different from legacy code
02 day-one-audit The systematic first inspection: traces, prompts, models, evals, errors
03 the-eval-backstop The safety net before any change
— milestone: you understand the system —
04 stop-bleeding-vs-do-right Sequencing the first fixes
05 prompts-out-of-code Carving prompts into a registry without breaking production
06 observability-retrofitted Adding traces, audit, and dashboards to a running system
— milestone: the system is operable —
07 strangler-migration Replacing the system one component at a time, with old and new in parallel
08 model-and-version-stabilization Taming model selection and the vendor-deprecation calendar
09 data-pipeline-and-context-debt The data side: retrieval, context, training data
10 stakeholder-management What to tell the PM, the customer, the on-call, the executive
11 the-30-60-90-plan The new lead's first three months, sequenced
— milestone: the system is modernising —
12 architect-checklist 20 items: audit, stabilise, instrument, migrate, operate
13 honest-admission What modernisation cannot fix

How this module relates to its neighbours


Top resources

  • Michael Feathers, Working Effectively with Legacy Code — the classic; most of its patterns translate to legacy AI with the eval-shaped safety layer added.
  • Martin Fowler, StranglerFigApplication — https://martinfowler.com/bliki/StranglerFigApplication.html
  • Bryan Helmkamp, "Branch by Abstraction" — https://martinfowler.com/bliki/BranchByAbstraction.html
  • Anthropic — model deprecation guide — https://docs.anthropic.com/en/docs/about-claude/model-deprecations
  • OpenAI — production best practices — https://platform.openai.com/docs/guides/production-best-practices
  • GitHub Engineering — Scientist (parallel-run pattern in production): https://github.com/github/scientist

What's coming

  1. 01-the-inherited-system.md — Why legacy AI requires a different playbook than legacy code, and the four states an inherited system can be in.
  2. 02-day-one-audit.md — What to inspect, in what order, in your first two weeks.
  3. 03-the-eval-backstop.md — The minimum eval coverage required before you change anything.
  4. 04-stop-bleeding-vs-do-right.md — Sequencing the first fixes; what to take on first and why.
  5. 05-prompts-out-of-code.md — Migrating prompts from source files into a versioned registry without breaking the running system.
  6. 06-observability-retrofitted.md — Adding traces, audit, and dashboards to a system that has none.
  7. 07-strangler-migration.md — Replacing the system one boundary at a time, with old and new in parallel.
  8. 08-model-and-version-stabilization.md — Removing hardcoded model IDs, handling vendor deprecations, taming the version sprawl.
  9. 09-data-pipeline-and-context-debt.md — The retrieval, context, and training-data side of the inherited system.
  10. 10-stakeholder-management.md — What to tell the PM, the customer, the on-call, the executive.
  11. 11-the-30-60-90-plan.md — The new lead's first three months, sequenced explicitly.
  12. 12-architect-checklist.md — Twenty items: audit, stabilise, instrument, migrate, operate.
  13. 13-honest-admission.md — Where modernisation has no defensible answer.

Bridge. Before we plan the audit, we need to know what we are auditing. Legacy AI is not just legacy code with a model attached; it has its own pathology. The first chapter is the diagnosis — what makes it different, and why the standard refactoring tools alone will not save you. → 01-the-inherited-system.md