Skip to content

07. Strangler migration

With the system operable — eval-backed, prompts in a registry, observability live — the structural work scales up. Strangler migration is how you replace the system one boundary at a time, with old and new running in parallel, without a big-bang rewrite that nobody can defend.


A platform lead at a Pune fintech inherits a credit-decisioning agent. The system is a 6,000-line Python monolith with three responsibilities tangled together: data fetching from internal services, prompt construction and model calling, and decision serialisation back to downstream systems. The previous team's instinct would have been "rewrite it." The lead's instinct is different. She picks one responsibility — the data-fetching layer — extracts an interface, builds a new implementation behind that interface, runs the new implementation in parallel for some weeks, compares outputs against the old, and once they match within tolerance, switches traffic. Three weeks. The data-fetching responsibility is now a separate component the team can evolve. The prompt-construction layer is next. Then the decision-serialisation. Six months later, the monolith is gone — replaced by three well-bounded components — and the system did not go through a single big-bang outage.

This chapter is the discipline behind that quiet replacement. Identify a boundary, build new behind it, run in parallel, compare outputs, switch traffic. The pattern is decades old; what is new is the eval and observability that make it safe for AI systems.


The pattern, in one sentence

The strangler migration replaces a legacy system by gradually extracting components, building new implementations behind explicit boundaries, running them in parallel against the old, and shifting traffic when they match.

The name comes from the Strangler Fig, a tree that grows around its host and eventually replaces it without the host ever falling over.

Four properties of the pattern that matter here:

  • Incremental. One boundary at a time. The system continues to run throughout.
  • Reversible. Each migration step has a rollback that does not affect the others.
  • Verifiable. The parallel run produces evidence that the new matches the old.
  • Independent. Each component, once migrated, can evolve at its own pace.

The eval backstop (chapter 03) and the observability (chapter 06) make the verification possible. Without those, parallel running is "we hope they match."


Picking the first boundary

Three candidates for the first extraction, with criteria:

  • Smallest contract. The boundary with the simplest interface — few methods, clear inputs/outputs — is the easiest to extract. A boundary where the new component takes the same inputs and produces the same outputs as the old is a clean comparison.
  • Lowest risk if wrong. A boundary whose component, if broken, produces a recoverable error rather than a silent wrong answer. Recoverable errors are detected; silent wrong answers are not.
  • Highest decoupling value. Extracting a boundary that gives the team the most freedom to change downstream. Data fetching is often a good first extraction because it lets the team change the upstream sources without touching the prompt code.

The intersection of "smallest contract + lowest risk + highest value" is usually obvious from the audit. Pick it.


The migration steps

For one boundary, six steps.

Step 1 — Define the interface

Read the old code carefully. Identify the inputs the boundary accepts and the outputs it produces. Write the interface as code or as a schema:

class CreditDataFetcher(Protocol):
    def fetch_credit_data(self, customer_id: str, lookback_days: int) -> CreditDataBundle: ...

@dataclass
class CreditDataBundle:
    payment_history: list[PaymentEvent]
    current_balances: dict[str, int]
    credit_inquiries: list[InquiryEvent]
    bureau_score: int | None

The interface is the contract the new and old implementations both honour. The shape of the inputs and outputs is taken from the old implementation; the new implementation will not be allowed to vary the shape.

Step 2 — Wrap the old behind the interface

The old code becomes one implementation of the interface. Often this is a thin adapter — the existing function is renamed, the interface methods call into it:

class LegacyCreditDataFetcher(CreditDataFetcher):
    def fetch_credit_data(self, customer_id, lookback_days):
        return self._legacy_fetch(customer_id, lookback_days)

Throughout this step, the running system uses LegacyCreditDataFetcher and behaves exactly as before. The interface introduction is a refactor, not a behaviour change. The eval (chapter 03) verifies.

Step 3 — Build the new implementation

Now build ModernCreditDataFetcher against the same interface. The internals can be different — different upstream sources, different caching, different error handling — but the input/output contract is the same.

The new implementation has its own unit tests that exercise the interface methods.

Step 4 — Run in parallel

Both implementations are wired up. For a fraction of traffic, both are called; the output of the legacy implementation is the one used; the new implementation's output is logged for comparison:

def fetch_with_comparison(customer_id, lookback_days):
    legacy_result = legacy.fetch_credit_data(customer_id, lookback_days)
    if random() < SHADOW_TRAFFIC_FRACTION:
        try:
            modern_result = modern.fetch_credit_data(customer_id, lookback_days)
            comparator.log_diff(legacy_result, modern_result, customer_id)
        except Exception as e:
            comparator.log_failure(e, customer_id)
    return legacy_result   # legacy is still the source of truth

This is the shadow phase. The new implementation is called but its result is not used; the comparator records discrepancies.

Run shadow at 1% → 10% → 50% → 100% over a week or two, depending on how rare interesting cases are. The longer the shadow runs, the more confidence the comparison provides.

Step 5 — Compare and reconcile

The comparator output is the deliverable of this step. For each shadowed call:

  • Did the new implementation succeed?
  • If both succeeded, do the outputs match?
  • If they differ, how do they differ? (Bureau-score field different by one? Different payment-history ordering? Missing field?)

A target of 100% match is unrealistic; the goal is to understand every category of difference and decide whether each is acceptable.

Common categories:

  • Acceptable differences. Output ordering, slight rounding, equivalent representations. Document and accept.
  • Bug differences. The new implementation has a bug; fix it.
  • Old-bug differences. The legacy implementation had a bug the new fixes. This is delicate. The new behaviour is correct; the legacy behaviour was wrong. Coordinate the change with consumers — they may depend on the bug.

Iterate steps 3–5 until the comparator shows no remaining unacceptable differences.

Step 6 — Cutover

Switch the source of truth from legacy to modern. The traffic shift mirrors the shadow phase: 1% → 10% → 50% → 100% over a few days, with the comparator still running but the modern result now used.

Keep the legacy implementation alive (in shadow, or available behind a flag) for a stability window — a week, maybe two — so a regression can be rolled back without redeploying.

After the stability window, remove the legacy implementation. The boundary is migrated.


What "match" means for AI components

The data-fetching example has deterministic outputs that compare cleanly. AI-shaped components — a prompt-construction layer, a model call layer — have non-deterministic outputs that do not.

For AI components, "match" is graded by the eval:

  • The legacy implementation's outputs score X on the eval set.
  • The modern implementation's outputs must score ≥ X on the same eval set.
  • Per-example comparison may show variation (different wording, different ordering); the aggregate must hold.

This is the eval backstop doing its real work. Without it, parallel running for AI components is not actionable.

For pure-AI boundaries (the model call layer itself), the comparison is even fuzzier — the new implementation may be a different model. The eval is the source of truth.


When to extract multiple boundaries in parallel

Often you can. Two extractions on non-overlapping surfaces can run their shadow phases simultaneously. The constraint is the team's reviewer bandwidth — each shadow phase produces comparator output that someone has to read.

Avoid extracting overlapping boundaries simultaneously. If boundary A produces inputs to boundary B, extract A first; otherwise the comparison on B is muddied by changes in A.


The cost of strangler

The pattern is not free. The shadow phase doubles the cost of the calls it shadows (both implementations run). For expensive AI calls, this matters. Mitigations:

  • Shadow only a fraction of traffic, not all of it
  • Shadow only diagnostic calls (calls with known-good outcomes for comparison)
  • Run shadow off-peak when capacity is cheaper
  • Cap the shadow phase to a finite duration; do not run it indefinitely

The cost is the tax for safety. It is almost always worth paying.


What strangler does not solve

  • A boundary that does not exist. Some legacy systems have no clean interface anywhere — everything calls everything. The first move is to introduce a boundary through refactoring (step 1 is more work) before strangler is possible.
  • A boundary where the legacy and modern fundamentally cannot produce equivalent outputs. Some migrations are not "preserve behaviour"; they are "change behaviour." Strangler is for behaviour-preservation; behaviour changes need separate evaluation against new criteria.
  • A monolith with no test coverage at all. Strangler still works, but the eval backstop (chapter 03) must be built first; the comparison is empty without it.

Common mistakes

Skipping the shadow phase. A team confident in the new implementation cuts traffic without shadow. Six hours later they discover a 3% mismatch in a niche case. Shadow is cheap; cutting it is rarely the right tradeoff.

Treating "match" as binary. AI outputs do not byte-match. Use the eval; accept variation within tolerance.

Running shadow indefinitely. Shadow has a cost. Cap the duration; commit to a cutover date.

Cutting over and removing the legacy immediately. Keep the legacy alive for a stability window. Rollback should not require a deploy.

Strangling around the wrong boundary. Picking a boundary that does not give you the decoupling you want. Re-read the audit; pick the highest-leverage extraction.


Interview Q&A

Q1. The team wants to "rewrite the AI module." What is the strangler alternative? Don't rewrite; replace. Identify the system's boundaries — data fetching, prompt construction, model calling, response handling, decision serialisation. Extract one boundary at a time behind an explicit interface. Build the new implementation against the interface; run it in shadow alongside the old; compare outputs; cut over when matched. Continue with the next boundary. The system continues to operate throughout. The rewrite path is months of work with a big-bang risk; the strangler path is months of work with continuous risk control. Wrong-answer notes: "rewrite is faster" misses the risk profile; strangler is often not slower because the parallel work continues delivering improvements.

Q2. Walk through a shadow run for a data-fetching component. Both legacy and modern implementations are wired up. For 1% (or 10%, or 50%) of traffic, the modern implementation is called alongside the legacy. The legacy result is used by the system. The modern result is logged. A comparator analyses pairs of results, categorising differences (acceptable, bug, old-bug-fix). Iterate the modern implementation until the comparator shows no unacceptable differences. Cut traffic 1% → 100% over a few days. Keep legacy alive for a week. Remove legacy. Wrong-answer notes: missing the comparator's role; the shadow alone produces data, the comparison produces decisions.

Q3. The shadow comparator finds that the modern implementation handles a case correctly that the legacy did not. The downstream consumers may depend on the legacy's wrong behaviour. What do you do? Identify the consumers; communicate the change; let them coordinate on whether they need a fix. The new behaviour is correct; cutting over silently may break downstream code that relies on the bug. Two patterns: ship the new behaviour with a flag and coordinate consumer updates, or maintain the legacy's behaviour in the new implementation as a "compatibility mode" until consumers migrate. Either way, the change becomes a coordinated cross-team migration rather than an unannounced switch. Wrong-answer notes: "ship the fix, it's correct" without coordination produces collateral damage.

Q4. The boundary you want to migrate has no clean interface — the code in question is intertwined with three other concerns. What do you do? Introduce the boundary through refactoring before strangler. Identify the cleanest interface possible given the code's current shape; extract it as a step that does not change behaviour (the eval verifies). Once the interface exists and the legacy implementation is behind it, strangler can proceed normally. Sometimes this preliminary refactor is the largest part of the work; the strangler steps then go quickly. Wrong-answer notes: "rewrite the whole thing" is the alternative the chapter is arguing against; the interface-introduction refactor is the gateway move.


What to do differently after reading this

  • Identify boundaries in the inherited system. Pick the one with smallest contract, lowest risk, highest decoupling value.
  • Define the interface first; wrap the legacy behind it; build the modern against it.
  • Always shadow before cutting over. The comparator output is the deliverable.
  • For AI components, use the eval as the comparison metric.
  • Keep the legacy alive briefly after cutover. Remove only after a stability window.

Bridge. The strangler pattern replaces components. One specific kind of component that almost always needs migration is the model selection itself — hardcoded model strings, no versioning, drift exposure. The next chapter is the model-and-version stabilisation that often runs alongside the first strangler extraction. → 08-model-and-version-stabilization.md