Skip to content

10. Dynamic replanning — revising the route without losing completed work

~19 min read. File 07 introduced the plan-execution manager's deviation response. File 09 showed how checkpoints preserve progress across crashes. This file combines both: when the plan itself must change mid-execution, how does the system revise the route while preserving trusted state, respecting budget constraints, and maintaining audit continuity?

Built on the first-principles overview in 00-first-principles.md. Plan freshness — the pressure that a plan decays as execution reveals new information — reaches its climax here. The replan trigger is the mechanism: an explicit, governed signal that the current plan no longer fits reality and the control plane must adapt.


What file 09 established and what remains

File 09 solved the problem of execution failing: the workflow crashes and resumes from a checkpoint. The plan itself was still valid — only the execution was interrupted. This file addresses a harder problem: the plan is wrong. Not because execution stopped, but because new evidence, changed scope, or broken assumptions mean the current route will not achieve the goal even if execution continues perfectly. The plan needs revision, not just resumption.


The research agent that kept searching for a source that was deleted

A deep-research workflow investigates "What is Company X's current SOC 2 compliance status?" The plan:

Plan v1:
  step 1: search Company X's trust portal
  step 2: extract latest SOC 2 report date
  step 3: verify with third-party compliance DB
  step 4: synthesize finding

Step 1 returns 404 — the trust portal was taken down last week. The automation retries (file 07's transient handling). Still 404. Retries again. Still 404. After three retries, the plan-execution manager classifies this as "deterministic failure — resource does not exist."

A system without dynamic replanning fails the workflow. The user gets nothing.

A system with dynamic replanning detects: "The plan assumed Company X has a public trust portal. This assumption is falsified." It triggers a replan trigger and produces:

Plan v2 (scoped replan):
  step 1: ╳ (original trust portal unavailable — assumption broken)
  step 1b: search for SOC 2 attestation on third-party sites (Vanta, Drata, SecurityScorecard)
  step 1c: search press releases mentioning compliance certification
  step 2: extract latest SOC 2 evidence from alternative sources
  step 3: (unchanged — verify with compliance DB)
  step 4: synthesize finding + note source limitation

  preserved state: none (step 1 produced no usable output)
  new constraint: mark final answer with uncertainty flag re: primary source

The workflow adapts. The final synthesis includes an uncertainty note about primary source unavailability. The user gets a qualified answer instead of a failure.

Without replanning:          With replanning:
404 → retry → retry → fail   404 → retry exhausted → classify
                              → assumption broken → replan step 1
                              → alternative search → qualified answer

Teacher voice. Replanning is not intelligence — it's governance. The replan trigger says "this plan can't work." The replan scope says "change only what's broken." The preserved state says "keep what's trusted." Unstructured replanning (throw away everything, start over) is wasteful and unauditable.


The invariant: replan scope is the minimum change that addresses the broken assumption

The temptation when a plan breaks is to regenerate everything from scratch. Full regeneration feels clean. It is expensive, discards trusted work, invalidates checkpoints, and makes the audit trail incoherent. The correct response is minimum viable replan: identify what assumption broke, identify which steps depend on that assumption, replace only those steps, and preserve everything else.


Retry vs fallback vs replan vs escalate — the decision ladder

File 07's failure classification determines what kind of problem occurred. This file determines what scope of response is needed:

Response ladder (least disruptive → most disruptive):

1. RETRY        same step, same plan                    cost: minimal
   ↓            "transient failure — try again"
2. FALLBACK     different execution path, same plan     cost: low
   ↓            "primary method failed, use backup"
3. LOCAL REPLAN changed steps within one branch         cost: moderate
   ↓            "assumption broke, replace affected steps"
4. GLOBAL REPLAN new plan from current state            cost: high
   ↓            "goal or constraints fundamentally changed"
5. ESCALATE     human decides next action               cost: highest
                "automation can't determine correct path"
Response When Preserves Changes
Retry Transient failure (503, timeout) Everything Nothing (repeat step)
Fallback Primary path failed permanently Plan structure One step's execution
Local replan Assumption broken in one branch Other branches + completed work Affected branch steps
Global replan Goal changed, or multiple assumptions broke Completed+trusted state All pending steps
Escalate Automation lacks information to choose Everything (paused) Human decides

The ladder is a cost optimisation: use the least disruptive response that addresses the actual problem. Overreacting (global replan for a transient 503) wastes money and state. Underreacting (retrying a broken assumption) wastes time in a dead-end loop.


Replan triggers: when the control plane knows the plan must change

A replan trigger is not "something went wrong." It's "the plan's structural assumptions are contradicted." Explicit triggers from file 07's table, expanded:

Replan triggers:

ASSUMPTION FALSIFIED
├── required resource doesn't exist (404 on expected endpoint)
├── precondition permanently false (applicant ineligible)
├── evidence contradicts prior conclusion (wrong root cause)
└── tool capability changed (API deprecated, model version shifted)

SCOPE CHANGED
├── user added requirements mid-run
├── user narrowed/broadened the goal
├── priority shifted (urgent → routine, or vice versa)
└── new stakeholder added constraints

BUDGET/SLA VIOLATED
├── remaining budget insufficient for planned steps
├── SLA deadline will be missed on current path
├── rate limits make planned parallelism impossible
└── human-time window closing (reviewer about to leave)

NEW EVIDENCE
├── later step reveals earlier step was wrong (backtracking)
├── external event changes the landscape (competitor announcement, policy update)
├── fraud/safety signal detected mid-workflow
└── quality check reveals prior output was below threshold

The key discipline: every replan trigger maps to a specific assumption in the plan that is now false. If you can't name the broken assumption, you shouldn't be replanning — you should be retrying or escalating.


Threaded example: loan-approval discovers employment fraud mid-workflow

The loan-approval workflow. Steps 1-3 completed successfully: identity verified, credit score 720, compliance flag "pass." The plan says: proceed to issue_decision (approve the loan). But step 3's compliance check included a secondary signal that wasn't surfaced until a post-check validation ran: the employment verification returned inconsistent dates. Not a hard failure — but a new evidence signal.

Timeline:
  step 1: verify_identity → ✓ (John Smith, ID confirmed)
  step 2: pull_credit → ✓ (score: 720)
  step 3: compliance_check → flag: "pass" 
           BUT secondary signal: employment dates inconsistent
           (claimed: employed since 2020. Verification: company founded 2023)

  Original plan step 4: issue_decision (approve)

  Problem: step 3 "passed" but produced evidence that contradicts
           the employment claim. Proceeding to approve is risky.

Without replanning: the workflow approves the loan. The employment inconsistency is buried in a log. Six months later, the loan defaults. Forensics finds the missed signal.

With replanning: the plan-execution manager detects the secondary signal (employment date mismatch > threshold). It fires a replan trigger: "new evidence contradicts employment claim validity."

Replan record:
├── trigger: "employment_dates_inconsistent (gap > 2 years)"
├── broken_assumption: "employment claim is valid (implicit in compliance pass)"
├── scope: local (add verification branch before decision)
├── preserved_state: {identity_verified, credit_score, compliance_flag}
├── invalidated_steps: [issue_decision (premature)]
├── new_steps:
│   ├── step 3b: deep employment verification (contact employer directly)
│   ├── step 3c: if unresolved → human_review (escalation, not approval)
│   └── step 4: issue_decision (moved after resolution)
└── plan_version: 1 → 2

The workflow branches into deeper verification. If the employment claim is legitimate (company rebranded, for example), the original path resumes. If fraudulent, the workflow routes to denial + investigation. Either way, the decision is informed rather than blind.


Preserving trusted state across replans

The most dangerous replan failure: state from the old plan leaks into the new plan without validation. Each replan must explicitly classify every piece of existing state:

State trust classification:

TRUSTED (safe to carry into new plan):
├── identity_verified: true (source: government ID, immutable)
├── credit_score: 720 (source: bureau, fresh, independent of employment)
└── workflow_id, applicant_id, timestamps

SUSPECT (needs re-validation in new plan):
├── compliance_flag: "pass" (based on employment data now in question)
└── employment_status: "employed" (directly contradicted)

INVALIDATED (do not use in new plan):
├── implied approval recommendation (based on suspect compliance)
└── any derived score using employment stability

The new plan steps operate only on TRUSTED state plus fresh data they produce themselves. SUSPECT state may be re-validated (compliance_check re-runs with updated evidence) or discarded. INVALIDATED state is excluded from downstream step inputs.

Teacher voice. "Preserve completed work" doesn't mean "trust everything completed." It means "carry forward the outputs that are independent of the broken assumption." Identity verification is still valid even if employment is suspect. Credit score from the bureau is still valid. But the compliance pass that assumed valid employment? That's suspect and needs re-evaluation.


Guardrails: preventing replan chaos

Dynamic replanning can create instability: each replan changes the plan, new steps produce new signals, new signals trigger more replans. Without guardrails, the workflow oscillates.

Guardrail mechanisms:

MAX REPLANS PER RUN:
  After N replans (e.g., 3), force escalation to human
  Rationale: frequent replanning signals environment instability
             or fundamentally unsuitable automation

MAX BACKTRACK DEPTH:
  Don't replan steps more than M levels backward from current position
  Rationale: deep backtracking approaches full restart cost
             and destroys audit continuity

REPLAN BUDGET:
  Each replan consumes plan-generation cost
  Total replan budget capped independently from execution budget
  Rationale: prevents replanning from consuming the budget
             meant for execution

COOLDOWN PERIOD:
  After a replan, execute at least K steps (or T time) before
  allowing another replan trigger
  Rationale: prevents immediate re-triggering from
             the same signal family

REPLAN DIFF LOGGING:
  Every replan produces a diff: old steps vs new steps,
  broken assumption named, trusted state listed
  Rationale: audit trail, debugging, operator understanding

These aren't anti-intelligence. They're anti-chaos. A system that replans indefinitely is indistinguishable from a system that improvises randomly.


Backtracking as a special case of replanning

Sometimes the plan doesn't need new steps — it needs to re-execute earlier steps with new information. This is backtracking: the workflow revisits a prior node because later evidence invalidated its output.

Forward replanning:
  step 1 → step 2 → step 3 → [assumption breaks] → add step 3b → continue

Backtracking:
  step 1 → step 2 → step 3 → [evidence shows step 2 was wrong]
                    → re-execute step 2 with new constraint → continue

The loan-approval example: if deep employment verification reveals the applicant used a different legal name, the system may need to backtrack to verify_identity with the corrected name — not because identity verification failed, but because the input was wrong.

Backtracking rules: - Bounded depth (max 2-3 steps back in most workflows) - Never re-executes side effects without idempotency protection - Records the loop: "backtracked from step 3 to step 1 because [reason]" - Capped iterations (max 2 backtracks to the same step before escalating)


Operational signals — healthy replanning, degrading replanning, broken replanning

Healthy behaviour: - Replan frequency < 20% of workflow runs (most plans execute without revision) - Local replans outnumber global replans 4:1 (most problems are scoped) - Replanned workflows complete successfully > 80% of the time - Replan generation time < 3s - State preservation rate > 60% (most prior work carries forward)

First degrading signal: - Replan frequency approaching 50% → initial plans are consistently wrong (improve planning, not replanning) - Global replans dominating → assumptions are breaking at plan level, not step level - Replanned workflows still failing → replanning isn't fixing the actual problem - Multiple replans per single run → guardrails not tight enough, or triggers too sensitive

Misleading metric: - "Zero replans" — sounds good, but may indicate that broken assumptions aren't being detected (the plan continues on a dead-end path instead of adapting) - "Plan stability" — rewarding unchanged plans can mean rewarding conservative plans that avoid hard cases

Expert signal: - First-replan success rate: does the first replan fix the problem, or does it take multiple iterations? - Trusted state carry rate: how much of the prior state survives into the new plan? Low rates suggest replanning is really restarting in disguise. - Trigger accuracy: are replan triggers firing for genuine assumption breaks, or are transient failures being mis-classified as structural problems?


Boundary of applicability

Works unusually well: - Research and investigation workflows where initial hypotheses frequently need revision based on evidence - Coding agents where early diagnostic assumptions often prove wrong - Enterprise workflows where scope changes mid-run (user adds requirements, priorities shift)

Becomes pathological: - Strict-SLA workflows where replanning latency exceeds the tolerance window - Workflows where all steps are independent (no assumption chain to break) - High-frequency automated pipelines where stability matters more than adaptability

Scale that invalidates naive intuition: - At 10+ steps in a plan, local replan precision matters more than replan speed (wrong scope wastes more than slow generation) - At high concurrency, replan generation (often an LLM call) becomes a throughput bottleneck — pre-computed replan templates or rule-based fallback plans help


Failure-prone assumption: "smart agents don't need structured replanning — they just adapt"

The seductive wrong idea: "A sufficiently capable model can just notice when the plan is wrong and adjust on the fly — we don't need explicit replan triggers or governed scope."

The correction: Unstructured adaptation is the agent version of "retry everything." Without explicit triggers, the system can't distinguish "this assumption is broken" from "this step had a transient failure." Without governed scope, the system may discard trusted work or re-execute dangerous side effects. Without diff logging, operators can't understand why the workflow changed direction. Structured replanning isn't a limitation on intelligence — it's a requirement for production trustworthiness.


Real-world implementations

  • OpenAI Deep Research — iterative research cycles detect when a hypothesis is unsupported and redirect browsing without re-reading already-processed sources
  • Devin by Cognition — coding workflow backtracks from failed tests to re-diagnose root cause with new constraints (exclude previously tried files)
  • GitHub Copilot coding agent — revises plans when discovered codebase structure differs from initial assumptions (e.g., test framework is pytest not unittest)
  • Google Gemini Deep Research — multi-step research adapts search strategy when primary sources lack expected information
  • Cursor Agent — plan revision when first edit approach fails compilation; preserves successful edits in other files
  • Claude code — agent workflows replan when tool outputs reveal different project structure than assumed
  • Microsoft Security Copilot — investigation plans adapt when new evidence reclassifies incident severity
  • Sierra AI — customer support workflows adapt when conversation reveals the customer's actual issue differs from the initially classified problem

Recall checkpoint

  1. How is replanning different from retry + fallback?
  2. What must a replan trigger explicitly name?
  3. Why is local replan preferred over global replan when possible?
  4. How does state trust classification prevent contamination across replans?
  5. What guardrails prevent replan chaos (oscillation)?
  6. When is backtracking appropriate and how is it bounded?
  7. What does "zero replans" signal about a system?

Interview Q&A

Q: Why must replan triggers name a specific broken assumption rather than just "step failed"? A: Because the scope of the response depends on what broke. A failed step might need retry (transient), fallback (permanent but local), or replan (structural assumption wrong). Only by naming the broken assumption can the system determine which steps are affected and what scope of change is needed. Common wrong answer to avoid: "Because it helps logging." Logging benefits, but the core reason is determining response scope correctly.

Q: Why preserve trusted state from the old plan rather than starting fresh? A: Re-executing trusted steps wastes budget, reintroduces side-effect risk, and destroys audit continuity. State that is independent of the broken assumption remains valid and should carry forward. Common wrong answer to avoid: "Because restarting is slow." Speed is one factor, but cost, side-effect safety, and audit continuity are equally important.

Q: Why is global replan a last resort rather than the default response to plan failure? A: Global replan discards all prior state (even trusted work), generates maximum cost, maximises instability, and makes the audit trail nearly impossible to follow. Local replan achieves adaptation with minimal disruption. Common wrong answer to avoid: "Because global replan is expensive." Cost is one reason, but stability and auditability are equally critical.

Q: Why cap backtracking depth rather than allowing unlimited revision? A: Unbounded backtracking can create infinite loops (step A invalidates step B, re-running step B invalidates step A). Caps force escalation before the workflow enters degenerate cycles. Common wrong answer to avoid: "Because backtracking is slow." Speed isn't the issue — the issue is preventing degenerate loops.

Q: How should a system distinguish a legitimate replan trigger from a mis-classified transient failure? A: Legitimate replan triggers persist after retry exhaustion and point to structural problems (resource doesn't exist, assumption contradicted). Transient failures resolve on retry. The failure classification tree (file 07) must run before replanning activates — replan should never be the first response. Common wrong answer to avoid: "Use confidence thresholds." Confidence helps, but the structural test is whether retry has any chance of succeeding.

Q: What does high replan frequency indicate about the system? A: Either initial planning is consistently weak (predictable information isn't being incorporated), the environment is genuinely unstable (reduce automation scope), or replan triggers are too sensitive (transient failures mis-classified as structural breaks). Common wrong answer to avoid: "That the system is adaptive." Frequent replanning is a symptom, not a virtue. Adaptive systems replan rarely because they plan well initially.


Design/debug exercise (10 min)

Modeled: The loan-approval workflow discovers employment date inconsistency after compliance check passes. Replan record: trigger = "employment_dates_inconsistent," broken assumption = "employment claim valid," scope = local (add verification branch), preserved state = {identity, credit_score}, new steps = [deep_employment_verification, conditional_human_escalation].

Your turn: A coding agent's plan is "search for auth bug → fix in auth_handler.py → test." Tests fail after the fix. The manager classifies this as "assumption broken" (wrong file). Write: (1) the replan trigger and named assumption, (2) the new plan with preserved state, (3) the state trust classification (what's trusted, suspect, invalidated), (4) the guardrail that prevents infinite backtracking.

From memory: Close this file and sketch: the response ladder (5 levels from retry to escalate), the state trust classification (3 categories), and the guardrails list (5 mechanisms) from memory.


Operational memory

Dynamic replanning is the mechanism that transforms a plan from a fixed script into a living hypothesis. When evidence contradicts an assumption, the control plane adapts — not by discarding everything, but by identifying the minimum scope of change that addresses the broken assumption while preserving trusted work. The response ladder (retry → fallback → local replan → global replan → escalate) is a cost optimisation: always use the least disruptive response that actually solves the problem.

The hardest parts: correctly classifying what assumption broke (determines scope), correctly classifying which state is still trusted (prevents contamination), and preventing replan chaos (guardrails against oscillation). Systems that replan too eagerly are as broken as systems that never replan — they just fail in a more dynamic-looking way. The test of good replanning is high first-replan success rate and high state preservation rate: the system fixes the actual problem on the first revision and carries forward as much prior work as possible.

Remember: - Replan triggers must name a specific broken assumption — "step failed" is not enough - Local replan (change one branch) beats global replan (regenerate everything) for cost, stability, and auditability - State trust classification: TRUSTED (independent of broken assumption), SUSPECT (needs re-validation), INVALIDATED (discard) - Guardrails prevent chaos: max replans, max backtrack depth, cooldown, replan budget, diff logging - "Zero replans" may signal undetected assumption breaks — not plan perfection - High replan frequency means weak initial planning — fix the planner, not the replan system - Backtracking is bounded: max 2-3 steps back, max 2 iterations per step before escalating

Bridge. Dynamic replanning governs how one workflow adapts under failure. But production systems run hundreds of workflows simultaneously for different users and organisations. When multiple tenants share the same control plane, isolation, fairness, and policy-aware routing become the next set of pressures. That's multi-tenant orchestration. → 11-multi-tenant-orchestration.md