00. Legacy AI modernization — First-principles overview¶
Every module before this one taught you to build AI systems well. This module is the discipline of inheriting one that was not built well — and fixing it without taking it down.
A new tech lead joins a mid-sized fintech in Bangalore. On day three, her manager hands her an AI-powered claims-triage system the previous team shipped two years ago. The previous team has dispersed. The Confluence pages are eighteen months old. The system handles three thousand claims per day with a 7% customer-complaint rate that has been climbing slowly. There are no evals. The prompts are buried inside a 2,400-line Python file. The model is hardcoded as a string in three different places. There are no traces. The CI suite is forty unit tests that mock the model. Operations have an unspoken rule: do not touch the AI module — it works, and nobody knows how. Her instructions from her manager are: "reduce complaints, ship improvements monthly, do not break what works."
This is the most common applied-lead scenario in 2026. Not "design a new AI feature" — "the previous team shipped this; fix it." The interview question and the production reality are the same question. This module is the playbook.
Why legacy AI is its own discipline¶
Legacy code is a well-studied problem. The patterns — characterisation tests, strangler fig, branch by abstraction, parallel run — are decades old. They mostly apply here. But four things make legacy AI different in kind, not just degree.
| Legacy AI is different because | Implication |
|---|---|
| The system is non-deterministic. The same input does not produce the same output. | Behavioural changes are hard to detect. "Did my change make it better?" cannot be answered without an eval. |
| The behaviour is driven by prompts and model versions, not by code paths | The most impactful changes are in artefacts (prompts, configs, models) that traditional refactoring tools do not see. |
| The vendor changes the model under you. | Even a "do nothing" stance is unstable. The system drifts even when you do not touch it. |
| The cost of wrong behaviour is often higher than wrong code | Production-shaped wrong outputs to users can damage trust permanently. Conservative discipline matters more. |
The corollary: standard legacy-code playbooks need an AI-specific safety layer added on top. That layer is the eval-shaped backstop (chapter 03) and the structured observability (chapter 06). Without them, you are refactoring a system you cannot prove you have not broken.
What this module teaches, in one sentence¶
You audit, stabilise, instrument, and replace an inherited AI system one piece at a time, with an eval backstop that catches regression, on a rhythm that ships improvement while delivering the existing service uninterrupted.
Read it left to right.
- Audit — chapter 02. Day-one mapping of what you actually have.
- Stabilise — chapter 04. The "stop the bleeding" decisions.
- Instrument — chapter 06. Observability retrofitted.
- Replace one piece at a time — chapter 07. Strangler migration.
- Eval backstop — chapter 03. The safety net before any change.
- Ship improvement while delivering — chapter 11. The 30-60-90 plan.
Every chapter is one of these phases or the practical artefact you produce.
The four states of an inherited AI system¶
Memorise this once. Every inherited system is in one of four states. Your first job — chapter 02 — is to put a label on the one you have.
| State | What you can do | What you cannot do |
|---|---|---|
| Frozen | Operate the system safely; trust its behaviour to the extent it has been validated historically | Ship changes — you have no signal whether they help or hurt |
| Observable | Operate; investigate incidents; characterise behaviour from traces | Ship changes — observability alone does not catch regressions |
| Eval-backed | Operate; investigate; ship changes against a backstop | Aggressively rewrite — the eval may not cover what you are about to change |
| Modular | Replace components without touching others | (this is the destination, not a state to inherit) |
The progression is the work: frozen → observable → eval-backed → modular. Most inherited systems are frozen; some are observable; almost none are eval-backed; very few are modular. The chapters of this module are the path along that progression.
The recurring vocabulary¶
These terms appear in every chapter.
| Name | What it is |
|---|---|
| the inherited system | The AI system you did not build that you are now responsible for |
| the day-one audit | The first systematic mapping of what is in the system, its inputs, outputs, and operational shape |
| the eval backstop | The minimum set of evals that lets you ship changes without breaking what works |
| the stop-the-bleeding fix | The smallest change that addresses the worst symptom, taken before the larger remediation |
| the strangler boundary | The interface inside the system across which old and new implementations coexist |
| the prompt registry | The owned, versioned location for prompts, separate from product code |
| the parallel run | Old and new implementations both serving traffic, with outputs compared |
| the regression eval | The eval that captures behaviour the team has decided must not change |
| the 30/60/90 plan | The sequenced commitments the new lead makes for the first three months |
The journey¶
This module has three acts.
Act 1 — Understand what you have (files 01–03). Read the system through three lenses: what makes it different from legacy code, what to audit on day one, and what eval backstop you need before doing anything else.
Act 2 — Stabilise and instrument (files 04–06). Decide what to fix first, separate prompts from code, retrofit observability. By the end of act 2 the system is operable; you can run it safely while you plan.
Act 3 — Migrate and operate (files 07–11). Strangler migration to a new architecture, model and version stabilisation, data-pipeline modernisation, stakeholder management, and the 30/60/90 rhythm.
Synthesis (files 12–13). Architect checklist and honest admission of what migration cannot solve.
Memory map¶
| # | File | What it adds |
|---|---|---|
| 01 | the-inherited-system | What makes legacy AI different from legacy code |
| 02 | day-one-audit | The systematic first inspection: traces, prompts, models, evals, errors |
| 03 | the-eval-backstop | The safety net before any change |
| — milestone: you understand the system — | ||
| 04 | stop-bleeding-vs-do-right | Sequencing the first fixes |
| 05 | prompts-out-of-code | Carving prompts into a registry without breaking production |
| 06 | observability-retrofitted | Adding traces, audit, and dashboards to a running system |
| — milestone: the system is operable — | ||
| 07 | strangler-migration | Replacing the system one component at a time, with old and new in parallel |
| 08 | model-and-version-stabilization | Taming model selection and the vendor-deprecation calendar |
| 09 | data-pipeline-and-context-debt | The data side: retrieval, context, training data |
| 10 | stakeholder-management | What to tell the PM, the customer, the on-call, the executive |
| 11 | the-30-60-90-plan | The new lead's first three months, sequenced |
| — milestone: the system is modernising — | ||
| 12 | architect-checklist | 20 items: audit, stabilise, instrument, migrate, operate |
| 13 | honest-admission | What modernisation cannot fix |
How this module relates to its neighbours¶
05_ai_incident_operations— the inherited system will produce incidents. That module is how to handle them; this one is how to stop reproducing them.04_ai_product_evals— the eval backstop in chapter 03 leans on the eval discipline taught there.13_prompt_lifecycle_operations— chapter 05's prompt registry is the implementation of that module's discipline.03_agent_observability_debugging— chapter 06 retrofits the patterns from that module onto an existing system.20_engineering_leadership_judgment— chapter 10 (stakeholder management) is the leadership face of this module's technical decisions.
Top resources¶
- Michael Feathers, Working Effectively with Legacy Code — the classic; most of its patterns translate to legacy AI with the eval-shaped safety layer added.
- Martin Fowler, StranglerFigApplication — https://martinfowler.com/bliki/StranglerFigApplication.html
- Bryan Helmkamp, "Branch by Abstraction" — https://martinfowler.com/bliki/BranchByAbstraction.html
- Anthropic — model deprecation guide — https://docs.anthropic.com/en/docs/about-claude/model-deprecations
- OpenAI — production best practices — https://platform.openai.com/docs/guides/production-best-practices
- GitHub Engineering — Scientist (parallel-run pattern in production): https://github.com/github/scientist
What's coming¶
- 01-the-inherited-system.md — Why legacy AI requires a different playbook than legacy code, and the four states an inherited system can be in.
- 02-day-one-audit.md — What to inspect, in what order, in your first two weeks.
- 03-the-eval-backstop.md — The minimum eval coverage required before you change anything.
- 04-stop-bleeding-vs-do-right.md — Sequencing the first fixes; what to take on first and why.
- 05-prompts-out-of-code.md — Migrating prompts from source files into a versioned registry without breaking the running system.
- 06-observability-retrofitted.md — Adding traces, audit, and dashboards to a system that has none.
- 07-strangler-migration.md — Replacing the system one boundary at a time, with old and new in parallel.
- 08-model-and-version-stabilization.md — Removing hardcoded model IDs, handling vendor deprecations, taming the version sprawl.
- 09-data-pipeline-and-context-debt.md — The retrieval, context, and training-data side of the inherited system.
- 10-stakeholder-management.md — What to tell the PM, the customer, the on-call, the executive.
- 11-the-30-60-90-plan.md — The new lead's first three months, sequenced explicitly.
- 12-architect-checklist.md — Twenty items: audit, stabilise, instrument, migrate, operate.
- 13-honest-admission.md — Where modernisation has no defensible answer.
Bridge. Before we plan the audit, we need to know what we are auditing. Legacy AI is not just legacy code with a model attached; it has its own pathology. The first chapter is the diagnosis — what makes it different, and why the standard refactoring tools alone will not save you. → 01-the-inherited-system.md