00. Legacy AI modernization — First-principles overview¶

Every module before this one taught you to build AI systems well. This module is the discipline of inheriting one that was not built well — and fixing it without taking it down.

A new tech lead joins a mid-sized fintech in Bangalore. On day three, her manager hands her an AI-powered claims-triage system the previous team shipped two years ago. The previous team has dispersed. The Confluence pages are eighteen months old. The system handles three thousand claims per day with a 7% customer-complaint rate that has been climbing slowly. There are no evals. The prompts are buried inside a 2,400-line Python file. The model is hardcoded as a string in three different places. There are no traces. The CI suite is forty unit tests that mock the model. Operations have an unspoken rule: do not touch the AI module — it works, and nobody knows how. Her instructions from her manager are: "reduce complaints, ship improvements monthly, do not break what works."

This is the most common applied-lead scenario in 2026. Not "design a new AI feature" — "the previous team shipped this; fix it." The interview question and the production reality are the same question. This module is the playbook.

Why legacy AI is its own discipline¶

Legacy code is a well-studied problem. The patterns — characterisation tests, strangler fig, branch by abstraction, parallel run — are decades old. They mostly apply here. But four things make legacy AI different in kind, not just degree.

Legacy AI is different because	Implication
The system is non-deterministic. The same input does not produce the same output.	Behavioural changes are hard to detect. "Did my change make it better?" cannot be answered without an eval.
The behaviour is driven by prompts and model versions, not by code paths	The most impactful changes are in artefacts (prompts, configs, models) that traditional refactoring tools do not see.
The vendor changes the model under you.	Even a "do nothing" stance is unstable. The system drifts even when you do not touch it.
The cost of wrong behaviour is often higher than wrong code	Production-shaped wrong outputs to users can damage trust permanently. Conservative discipline matters more.

The corollary: standard legacy-code playbooks need an AI-specific safety layer added on top. That layer is the eval-shaped backstop (chapter 03) and the structured observability (chapter 06). Without them, you are refactoring a system you cannot prove you have not broken.

What this module teaches, in one sentence¶

You audit, stabilise, instrument, and replace an inherited AI system one piece at a time, with an eval backstop that catches regression, on a rhythm that ships improvement while delivering the existing service uninterrupted.

Read it left to right.

Audit — chapter 02. Day-one mapping of what you actually have.
Stabilise — chapter 04. The "stop the bleeding" decisions.
Instrument — chapter 06. Observability retrofitted.
Replace one piece at a time — chapter 07. Strangler migration.
Eval backstop — chapter 03. The safety net before any change.
Ship improvement while delivering — chapter 11. The 30-60-90 plan.

Every chapter is one of these phases or the practical artefact you produce.

The four states of an inherited AI system¶

Memorise this once. Every inherited system is in one of four states. Your first job — chapter 02 — is to put a label on the one you have.

State	What you can do	What you cannot do
Frozen	Operate the system safely; trust its behaviour to the extent it has been validated historically	Ship changes — you have no signal whether they help or hurt
Observable	Operate; investigate incidents; characterise behaviour from traces	Ship changes — observability alone does not catch regressions
Eval-backed	Operate; investigate; ship changes against a backstop	Aggressively rewrite — the eval may not cover what you are about to change
Modular	Replace components without touching others	(this is the destination, not a state to inherit)

The progression is the work: frozen → observable → eval-backed → modular. Most inherited systems are frozen; some are observable; almost none are eval-backed; very few are modular. The chapters of this module are the path along that progression.

The recurring vocabulary¶

These terms appear in every chapter.

Name	What it is
the inherited system	The AI system you did not build that you are now responsible for
the day-one audit	The first systematic mapping of what is in the system, its inputs, outputs, and operational shape
the eval backstop	The minimum set of evals that lets you ship changes without breaking what works
the stop-the-bleeding fix	The smallest change that addresses the worst symptom, taken before the larger remediation
the strangler boundary	The interface inside the system across which old and new implementations coexist
the prompt registry	The owned, versioned location for prompts, separate from product code
the parallel run	Old and new implementations both serving traffic, with outputs compared
the regression eval	The eval that captures behaviour the team has decided must not change
the 30/60/90 plan	The sequenced commitments the new lead makes for the first three months

The journey¶

This module has three acts.

Act 1 — Understand what you have (files 01–03). Read the system through three lenses: what makes it different from legacy code, what to audit on day one, and what eval backstop you need before doing anything else.

Act 2 — Stabilise and instrument (files 04–06). Decide what to fix first, separate prompts from code, retrofit observability. By the end of act 2 the system is operable; you can run it safely while you plan.

Act 3 — Migrate and operate (files 07–11). Strangler migration to a new architecture, model and version stabilisation, data-pipeline modernisation, stakeholder management, and the 30/60/90 rhythm.

Synthesis (files 12–13). Architect checklist and honest admission of what migration cannot solve.

Memory map¶

#	File	What it adds
01	the-inherited-system	What makes legacy AI different from legacy code
02	day-one-audit	The systematic first inspection: traces, prompts, models, evals, errors
03	the-eval-backstop	The safety net before any change
	— milestone: you understand the system —
04	stop-bleeding-vs-do-right	Sequencing the first fixes
05	prompts-out-of-code	Carving prompts into a registry without breaking production
06	observability-retrofitted	Adding traces, audit, and dashboards to a running system
	— milestone: the system is operable —
07	strangler-migration	Replacing the system one component at a time, with old and new in parallel
08	model-and-version-stabilization	Taming model selection and the vendor-deprecation calendar
09	data-pipeline-and-context-debt	The data side: retrieval, context, training data
10	stakeholder-management	What to tell the PM, the customer, the on-call, the executive
11	the-30-60-90-plan	The new lead's first three months, sequenced
	— milestone: the system is modernising —
12	architect-checklist	20 items: audit, stabilise, instrument, migrate, operate
13	honest-admission	What modernisation cannot fix

How this module relates to its neighbours¶

05_ai_incident_operations — the inherited system will produce incidents. That module is how to handle them; this one is how to stop reproducing them.
04_ai_product_evals — the eval backstop in chapter 03 leans on the eval discipline taught there.
13_prompt_lifecycle_operations — chapter 05's prompt registry is the implementation of that module's discipline.
03_agent_observability_debugging — chapter 06 retrofits the patterns from that module onto an existing system.
20_engineering_leadership_judgment — chapter 10 (stakeholder management) is the leadership face of this module's technical decisions.

Top resources¶

Michael Feathers, Working Effectively with Legacy Code — the classic; most of its patterns translate to legacy AI with the eval-shaped safety layer added.
Martin Fowler, StranglerFigApplication — https://martinfowler.com/bliki/StranglerFigApplication.html
Bryan Helmkamp, "Branch by Abstraction" — https://martinfowler.com/bliki/BranchByAbstraction.html
Anthropic — model deprecation guide — https://docs.anthropic.com/en/docs/about-claude/model-deprecations
OpenAI — production best practices — https://platform.openai.com/docs/guides/production-best-practices
GitHub Engineering — Scientist (parallel-run pattern in production): https://github.com/github/scientist

What's coming¶

01-the-inherited-system.md — Why legacy AI requires a different playbook than legacy code, and the four states an inherited system can be in.
02-day-one-audit.md — What to inspect, in what order, in your first two weeks.
03-the-eval-backstop.md — The minimum eval coverage required before you change anything.
04-stop-bleeding-vs-do-right.md — Sequencing the first fixes; what to take on first and why.
05-prompts-out-of-code.md — Migrating prompts from source files into a versioned registry without breaking the running system.
06-observability-retrofitted.md — Adding traces, audit, and dashboards to a system that has none.
07-strangler-migration.md — Replacing the system one boundary at a time, with old and new in parallel.
08-model-and-version-stabilization.md — Removing hardcoded model IDs, handling vendor deprecations, taming the version sprawl.
09-data-pipeline-and-context-debt.md — The retrieval, context, and training-data side of the inherited system.
10-stakeholder-management.md — What to tell the PM, the customer, the on-call, the executive.
11-the-30-60-90-plan.md — The new lead's first three months, sequenced explicitly.
12-architect-checklist.md — Twenty items: audit, stabilise, instrument, migrate, operate.
13-honest-admission.md — Where modernisation has no defensible answer.

Bridge. Before we plan the audit, we need to know what we are auditing. Legacy AI is not just legacy code with a model attached; it has its own pathology. The first chapter is the diagnosis — what makes it different, and why the standard refactoring tools alone will not save you. → 01-the-inherited-system.md