05. State and context management — what crosses each step boundary¶

~18 min read. Steps produce intermediate state: retrieved documents, computed scores, approval decisions. Where does that state live, who can read it, and how much of it travels to the next step? Get this wrong and either every step drowns in irrelevant context or critical facts vanish between nodes.

Built on the first-principles overview in 00-first-principles.md. Handoff fidelity — the pressure that each agent-to-agent boundary is a serialisation boundary where context either survives intact or degrades — is the central tension. Durability vs latency appears again: persisting every intermediate result is safe but expensive; persisting nothing makes recovery impossible.

What file 04 established and what remains¶

File 04 gave the workflow graph its shape: sequential, parallel, DAG, conditional. Steps now have edges and execution order. But the edges carry data — the output of one step becomes the input of the next. How that data is structured, stored, scoped, and compressed determines whether multi-step workflows remain coherent or degrade into context noise.

The gap: agents don't share memory by default. Each agent execution starts with a fresh context window. The control plane must explicitly manage what state exists, where it lives, and how much of it each step sees.

The research workflow that drowned in its own evidence¶

A competitive-intelligence workflow: three parallel branches search for pricing, deployment model, and compliance posture of three vendors. Each branch returns 2,000–3,000 tokens of raw browsing notes. The synthesis agent receives all nine outputs concatenated — 22,000 tokens of unstructured notes dropped into its context window.

Result: the synthesis agent produces a comparison table that cites two vendors correctly and hallucinates the third's compliance status. The correct information was in the context — buried at position 14,000 in a block of notes from a different branch. The model's attention degraded across the long, unstructured input.

A second version: each branch writes a structured summary (vendor name, pricing signal, deployment model, compliance evidence, uncertainty flags) — ~200 tokens per vendor. The synthesis agent receives 600 tokens of normalised input. It produces a correct comparison on every run.

Same data. Same model. Different state management. The first version treated state as "append everything." The second version treated state as "compress to what the next step actually needs."

Teacher voice. State management in workflows is context engineering for multi-agent systems. The same principles that apply to single-agent prompt design (relevance, compression, structure) apply to inter-step data flow — except that the consequences of getting it wrong are multiplied by the number of steps.

The invariant: each step receives only the state it declared as input¶

The chapter protects one rule: a step's context is determined by its typed input contract (from file 02), not by the accumulated history of all prior execution. State that a step didn't declare as input should not appear in its context window.

This creates a pull model, not a push model. Steps don't receive "everything that happened before." They receive exactly the fields they declared. The control plane is responsible for resolving those fields from workflow state and passing them through the handoff contract.

If this invariant is violated — if steps receive unscoped state — three problems emerge: context noise (irrelevant information degrades model attention), cost bloat (tokens paid for unused context on every step), and privacy leakage (state from one tenant's prior steps bleeding into another's execution in multi-tenant systems).

Four layers of workflow state¶

Not all state has the same lifecycle, audience, or durability requirement. Conflating them produces the "dump everything into the prompt" failure.

┌────────────────────────────────────────────────────────────────┐
│  Layer 1: DURABLE FACTS                                        │
│  Live for the entire workflow. Visible to any step that        │
│  declares them as input. Survive crashes.                      │
│  Examples: user_id, tenant_id, task_goal, policy_version,      │
│  confirmed findings, approval decisions                        │
├────────────────────────────────────────────────────────────────┤
│  Layer 2: STEP OUTPUTS                                         │
│  Produced by one step, consumed by specific downstream steps.  │
│  Live until consumed or workflow completes.                     │
│  Examples: risk_score, retrieved_documents, computed_summary    │
├────────────────────────────────────────────────────────────────┤
│  Layer 3: EXECUTION METADATA                                   │
│  Produced by the control plane itself. Used for scheduling,    │
│  retry logic, and operational visibility. Not passed to agents.│
│  Examples: step_duration, retry_count, executor_used,          │
│  tokens_consumed, checkpoint_id                                │
├────────────────────────────────────────────────────────────────┤
│  Layer 4: AUDIT TRAIL                                          │
│  Immutable record of what happened. Never modified. Never      │
│  passed to agents. Read by operators, compliance, debugging.   │
│  Examples: full model responses, tool call logs, timestamps,   │
│  approval records, error traces                                │
└────────────────────────────────────────────────────────────────┘

Key design choice: only Layers 1 and 2 flow to agents. Layers 3 and 4 are control-plane internal. This separation keeps agent context focused on the work (what facts exist, what the step should produce) rather than operational noise (how many retries happened, what the previous model said verbatim).

Threaded example — loan workflow state at each boundary¶

From prior files, the loan-approval workflow. Here's the state at each checkpoint:

After step	Layer 1 (durable facts)	Layer 2 (step output)	What flows to next step
1. Parse docs	applicant_name, income, employer, loan_amount	parsed_fields:	Step 2 reads: parsed_fields
2. Eligibility	eligible: true	eligibility_reason: "income ratio ok"	Step 3 reads: applicant_id (L1)
3a. Credit bureau	—	credit_score: 720, debt_ratio: 0.31	Step 3c reads: credit_score, debt_ratio
3b. Internal hist.	—	avg_balance: ₹4.2L, missed_payments: 0	Step 3c reads: avg_balance, missed_payments
3c. Risk score	—	risk_score: 72, risk_factors: [...]	Step 3d reads: risk_score, risk_factors
3d. Policy check	policy_version: "v12.3"	approved: true, threshold_exceeded: true	Step 4 reads: risk_score, threshold_exceeded
4. Human review	approval_decision: "approved", approver: "priya@..."	—	Step 5 reads: approval_decision, loan_amount

Notice: step 5 (send offer) does not receive credit_score, internal history, or the full risk_factors array. It receives only what it declared: approval decision and loan amount. The 3,000 tokens of intermediate reasoning stay in the checkpoint store — available for audit, invisible to the offer-generation agent.

Compression strategies — what to pass vs what to store¶

Every step boundary is a compression opportunity. The raw output of a step may be 2,000 tokens. The next step may need only 50 tokens of structured fields.

Strategy	When to use	Risk
Pass typed fields only	When the next step has a narrow typed input contract	Loses nuance if contract is too narrow
Summarise and pass	When the next step needs the gist but not verbatim evidence	Summary may lose critical details
Pass full output	When the next step genuinely needs all details (rare)	Context bloat, cost increase
Store full, pass reference	When recovery/audit needs the full output but the next step needs only a key	Adds a lookup step if the agent needs to pull details

Production default: store full output in the checkpoint store (Layer 4); pass only the typed fields declared by the next step's input contract (Layer 2 → Layer 1). This gives you audit trail + recovery + minimal agent context simultaneously.

step A executes
    │
    ├── full output → checkpoint store (durable, queryable)
    │
    └── typed fields → next step's input (minimal, focused)

Teacher voice. The cardinal sin of state management is conflating what you must store (everything, for audit and recovery) with what you must pass (only what the next step declared). Store aggressively. Pass selectively.

Shared state consistency — the write-conflict problem¶

When parallel branches write to shared state, conflicts arise. Two branches discover different values for the same field (e.g., both find a "compliance status" but disagree).

Conflict resolution strategies:

Strategy	When to use
Last write wins	When temporal ordering is meaningful and later data is fresher
Highest confidence wins	When sources have reliability differences
Merge (union)	When both values are valid (e.g., collecting evidence from multiple sources)
Escalate to human	When conflict resolution requires judgment the system cannot provide
Fail the workflow	When inconsistency is unacceptable (regulatory requirements)

For the loan workflow: steps 3a and 3b write to different fields (credit_score vs avg_balance) — no conflict. But if two parallel branches both attempted to determine "risk_category" and disagreed, the control plane needs a declared resolution strategy before the merge step.

The safe default: parallel branches write to non-overlapping field namespaces. If overlap is possible, declare the resolution strategy at graph-definition time, not at runtime.

State scoping for multi-tenancy¶

In a multi-tenant system, workflow state from tenant A must never be visible to tenant B's execution — even when both workflows run on the same infrastructure.

State isolation requirements:

Namespace isolation: Every state field is prefixed with tenant_id + workflow_id. No global state namespace.
Checkpoint isolation: Checkpoints are partitioned by tenant. A resume operation cannot accidentally load another tenant's checkpoint.
Context isolation: An agent executing tenant A's step must never see tenant B's durable facts in its context window — even if both are in-flight simultaneously.
Audit isolation: Compliance queries must be scoped. "Show me all state for workflow X" must only return data for the requesting tenant.

This connects to file 11 (multi-tenant orchestration) — but the foundation is here: state scoping is a data-architecture decision made at the state-management layer, not bolted on later.

Operational signals¶

Signal	Meaning
Healthy: step context sizes are stable and within declared input schemas	State passing is well-scoped
First degrading: context size grows monotonically across steps	State compression is missing — each step appends without filtering
Misleading: all steps succeed but final output quality drops	Relevant information is being compressed away or buried in noise
Expert inspects: diff between what's in the checkpoint store vs what's in agent context	If they're identical, you're over-passing. If agent context is missing critical fields, under-passing

Where this lives in the wild¶

LangGraph TypedDict State — state is a Python TypedDict. Each node reads and writes specific fields. The framework enforces that nodes see only declared state keys.
Temporal workflow state — the workflow function maintains structured state that activities read/write through typed parameters and return values.
Inngest step outputs — each step's return value is automatically persisted. Subsequent steps access prior outputs by step ID, not by prompt history.
OpenAI Agents SDK — context — the handoff mechanism passes a structured context object between agents rather than raw conversation history.
Prefect — task results are stored in configured result backends. Downstream tasks access them through typed interfaces.

Recall¶

What is the difference between state you must store and state you must pass?
What are the four layers of workflow state and who reads each?
In the loan workflow, why doesn't the offer-generation step receive the full risk_factors array?
What is the safe default for parallel branches writing to shared state?
How does state scoping prevent multi-tenant leakage?
What does monotonically growing context size across steps indicate?

Interview Q&A¶

Q: Why is "append everything to the prompt" a dangerous state strategy?

A: Because it violates the typed-input invariant. Steps receive irrelevant information that degrades model attention, increases cost (tokens paid for unused context on every step), and may leak sensitive data from prior steps. State management is relevance engineering — not hoarding.

Common wrong answer to avoid: "Because context windows have limits." Limits exist, but the deeper issue is relevance and attention degradation, which happens well before the window is full.

Q: When should you pass full step output rather than compressed fields?

A: When the next step genuinely needs the full detail — for example, a code-review step that needs the complete diff, or a legal review step that needs verbatim contract text. But even then, scope it: pass the full relevant artifact, not every artifact from every prior step.

Common wrong answer to avoid: "When the model is strong enough to handle it." Model capability doesn't determine state design. The next step's declared input contract does.

Q: How do you handle state conflicts when parallel branches disagree?

A: Declare a resolution strategy at graph-definition time: last-write-wins, highest-confidence, merge, escalate, or fail. The safe default is non-overlapping field namespaces — parallel branches write to different fields so conflicts cannot arise.

Common wrong answer to avoid: "Let the merge step figure it out." Undeclared conflict resolution pushes an infrastructure problem into the agent's reasoning — where it becomes non-deterministic and untestable.

Q: Why separate execution metadata from agent-visible state?

A: Because retry counts, step durations, and checkpoint IDs are control-plane concerns. Passing them to agents adds noise without helping the agent do its job. They should be visible to operators and the dispatch loop, not to step executors.

Common wrong answer to avoid: "To save tokens." Token savings are a side benefit. The primary reason is separation of concerns: agents reason about the work, the control plane reasons about execution.

Design/debug exercise (10 min)¶

Modeled example. The loan workflow's state at checkpoint c3 (after step 3c): Layer 1 holds applicant_name, loan_amount, eligible, policy_version. Layer 2 holds credit_score, debt_ratio, avg_balance, risk_score. Layer 3 holds step_durations, retry_counts. Layer 4 holds full model responses from all prior steps. Step 3d's input contract requests only: risk_score, risk_factors, policy_version. Everything else stays in the store.

Your turn. Take a workflow with 4+ steps. For each step boundary, list: what's in Layer 1, what's in Layer 2, what the next step's input contract declares, and what gets compressed away.

From memory. Draw the four-layer state model. Sketch the loan workflow's state at checkpoint c3 and label what step 3d actually receives vs what's stored.

Operational memory¶

This chapter established that state management in workflows is context engineering at the inter-step level. The invariant is that each step receives only the state it declared as input — not the accumulated history of all prior execution. State is separated into four layers: durable facts (workflow-wide), step outputs (produced/consumed between specific nodes), execution metadata (control-plane internal), and audit trail (immutable record).

The compression principle: store full outputs in the checkpoint store for audit and recovery; pass only typed fields to the next step's input contract. This gives you durability without context bloat. Parallel branches must write to non-overlapping field namespaces or declare an explicit conflict resolution strategy.

Remember:

Store aggressively (everything, for audit and recovery). Pass selectively (only declared inputs).
Each step's context is its typed input contract — not "everything before."
Four layers: durable facts, step outputs, execution metadata, audit trail. Only layers 1–2 flow to agents.
Monotonically growing context across steps = missing compression. Fix by enforcing typed input contracts.
Parallel branch conflicts require a declared resolution strategy at graph-definition time.
State namespace isolation is the foundation of multi-tenant safety.
The loan workflow's step 5 receives approval_decision and loan_amount — not 3,000 tokens of intermediate risk reasoning.

Bridge. We now have state architecture: layers, compression, scoping. But how does this translate into code? LangGraph makes workflow graphs, state schemas, reducers, and checkpointing concrete — turning the abstractions of files 01–05 into executable machinery. Next: LangGraph deep dive. → 06-langgraph-deep-dive.md