02. Task decomposition — turning intent into a workflow graph¶

~18 min read. Users speak in wishes. The control plane needs executable steps with typed boundaries. Decomposition is the bridge between "resolve this complaint" and a graph the dispatch loop can actually run.

Built on the first-principles overview in 00-first-principles.md. The workflow graph — the declared structure of steps and edges — begins here. Handoff fidelity is the pressure: each boundary between steps is a serialisation boundary where context either survives intact or degrades silently.

What file 01 established and what remains¶

File 01 established that coordination logic belongs in explicit infrastructure above the agent. The control plane exists. It has a dispatch loop. But the dispatch loop needs something to dispatch — a structured sequence of steps with typed inputs, typed outputs, and explicit dependencies. Without that structure, the control plane is an empty scheduler with no workload.

The gap: user intent arrives as natural language ("process this refund", "investigate this alert", "onboard this customer"). The dispatch loop needs a directed graph of nodes with contracts between them. This file builds the translation layer.

The refund request that hid seven decisions¶

A customer writes: "I was charged twice for order #4481. Please refund the duplicate and confirm by email."

One sentence. Seven implicit steps: (1) identify the customer, (2) retrieve order #4481, (3) find both charges, (4) verify duplication, (5) check refund policy and thresholds, (6) process the refund, (7) send confirmation email. A monolithic agent might execute all seven in one reasoning chain. The result works — until step 6 fails and the system must decide: which steps can be skipped on retry? Which have side effects? Which need the output of which?

Without decomposition, those questions have no answer. The entire chain re-runs. With decomposition, the workflow graph knows that steps 1–5 are read-only (safe to replay), step 6 is a side-effect with an idempotency requirement, and step 7 depends on step 6's output. Recovery becomes surgical instead of total.

Teacher voice. Decomposition is not planning in the LLM sense (reasoning about what to do). It is graph construction in the engineering sense: defining nodes, edges, contracts, and checkpoint boundaries before any agent runs.

The invariant: every step boundary is a recovery boundary¶

The chapter protects one rule: a step boundary must be a point where the workflow can checkpoint, validate, and potentially resume without re-running prior steps.

If you cannot checkpoint at a boundary, it is not a real boundary — it is decoration. If you cannot validate the output at a boundary, downstream steps inherit undetected corruption. If you cannot resume from a boundary, the boundary adds overhead without adding durability.

This means decomposition is driven by durability and recoverability requirements, not by conceptual elegance. A "retrieve all data" step that fetches from three APIs might look like one logical action, but if the third API is flaky and the first two are expensive, splitting it into three nodes with checkpoints between them is the correct decomposition.

The four properties of a well-formed step¶

A step that the dispatch loop can safely execute has four properties:

┌─────────────────────────────────────────────────────┐
│  STEP CONTRACT                                      │
├─────────────────────────────────────────────────────┤
│  1. Typed input   — what state it reads             │
│  2. Typed output  — what state it writes            │
│  3. Risk class    — read / reversible-write /       │
│                     irreversible-write / human-gate │
│  4. Recovery mode — replay / idempotent-retry /     │
│                     compensate / restart            │
│                                                     │
│  If any property is undefined, the step is not      │
│  ready for the dispatch loop.                       │
└─────────────────────────────────────────────────────┘

Typed input means the step declares exactly which fields from shared workflow state it reads. No implicit access to "everything that happened before." This is what makes steps testable in isolation — you can construct the input without running the full prefix.

Typed output means the step declares the schema of what it produces. Downstream steps depend on this contract. If the output schema drifts, the handoff contract breaks.

Risk class determines what gates surround the step. Read-only steps need no approval. Irreversible-write steps need checkpoints before and potentially human approval.

Recovery mode determines what happens when the step fails or the workflow crashes mid-step. Can it simply replay? Does it need an idempotency key? Does it require a compensating transaction on failure?

Threaded example — loan approval decomposition¶

From file 01, the loan-approval workflow has five high-level steps. Now decompose step 3 (risk scoring) further:

step 3: score risk
├── 3a: retrieve credit bureau data        [external-read, idempotent]
├── 3b: retrieve internal transaction history  [internal-read, replay-safe]
├── 3c: compute risk score                 [pure-reasoning, replay-safe]
└── 3d: validate score against policy      [pure-reasoning, replay-safe]

Why split? Because 3a calls an external API that charges per hit. If 3c crashes, replaying from 3a means paying the bureau again. With this decomposition, the checkpoint after 3a means a crash at 3c only replays 3b–3d (cheap). The bureau call is never repeated.

Now contrast with bad decomposition:

step 3: "do all the risk stuff"    [mixed: external-read + reasoning + validation]

If this crashes after the bureau call, the recovery strategy is unclear. Was the bureau data saved? Is it safe to call again? Nobody knows — the boundary didn't exist.

Decomposition strategies¶

Three patterns, used alone or combined:

Dependency-driven decomposition. Start from the final output. Ask: what must I know to produce this? Trace backward. Each "must know" becomes a step, ordered by dependency. This is the default for most workflows.

final output: loan offer letter
  ← needs: approved risk score
    ← needs: credit data + internal history
      ← needs: parsed application documents
        ← needs: raw application upload

Risk-driven decomposition. Start from side effects. Every irreversible action becomes its own step. Everything before it becomes a read-only prefix. Everything after it becomes a separate graph that only executes if the side effect succeeded.

irreversible: process_refund(amount, account)
  ← prefix: verify eligibility (read-only, safe to replay)
  → suffix: send confirmation (depends on refund success)

Cost-driven decomposition. Start from expensive operations. Each expensive call (API fees, large token counts, long latency) becomes its own step with a checkpoint after it. Cheap operations can be grouped.

expensive: credit bureau API ($0.50/call, 3s latency)
  → checkpoint immediately after
  → subsequent cheap reasoning steps grouped together

Production decomposition usually combines all three: trace dependencies, isolate side effects, and checkpoint after expensive operations.

Granularity — the Goldilocks problem¶

Too coarse: large steps with mixed concerns. Failures are expensive (must restart the whole step). Testing requires the entire prefix. Observability is low (one span for 30 seconds of mixed work).

Too fine: dozens of tiny steps. Orchestration overhead dominates. State serialisation at every boundary adds latency. The workflow graph becomes unreadable.

Three tests for correct granularity:

Test	If yes → split	If no → keep grouped
Can you retry this step independently without side effects from other work inside it?	Split at the side-effect boundary	Group the pure operations
Would a crash mid-step lose expensive work that preceded the crash point?	Split before the expensive operation	Keep together if replay is cheap
Does a different agent, model, or policy own part of this step?	Split at the ownership boundary	Keep together if same executor

For the loan workflow: parsing and eligibility are different executors (split). Credit data retrieval and internal history retrieval are independent reads (can parallelise, so split into parallel branches). Risk computation and policy validation are both pure reasoning by the same model (can group).

Failure modes in decomposition¶

Missing dependency edges. Step 5 implicitly needs data from step 2, but the workflow graph has no edge between them. The dispatch loop runs step 5 before step 2 completes. Result: step 5 hallucinates or fails on missing input.

Phantom coupling. Two steps write to the same shared state field. The graph says they're independent (parallel-safe). In practice, one overwrites the other's output. Result: silent data corruption downstream.

Risk-class confusion. A step labelled "read-only" actually modifies external state (e.g., a "read" API that triggers a webhook on the target system). The control plane treats it as safe to replay. Result: duplicate external effects on retry.

Over-rigid sequencing. Steps that could run in parallel are sequenced because the team decomposed temporally ("first we do X, then Y") instead of by dependency ("Y needs X's output"). Result: unnecessary latency.

Operational signals¶

Signal	Meaning
Healthy: steps complete in expected order with minimal retries	Decomposition matches actual dependencies
First degrading: retry rate spikes on one step while others are fine	That step may be too coarse (mixing a flaky operation with stable ones)
Misleading: all steps show green individually but E2E quality drops	Handoff contracts may be too loose — downstream steps receive valid-shaped but semantically wrong input
Expert inspects: checkpoint-to-checkpoint duration variance	High variance within a step suggests internal operations that should be separate nodes

Where this lives in the wild¶

Temporal workflows — developers decompose work into "activities" (steps) with explicit input/output types, retry policies, and timeouts. The workflow function defines the graph; Temporal provides durability.
LangGraph — nodes are Python functions with typed State input/output. Edges define dependencies. The graph IS the decomposition made executable.
OpenAI Deep Research — the planning phase breaks a broad question into search queries, page reads, and synthesis steps before any execution begins.
Devin (Cognition) — a planning agent decomposes "build this feature" into inspect → design → implement → test → fix steps, each with clear success criteria.
GitHub Copilot coding agent — work is decomposed into search → edit → validate cycles, where each cycle has a checkpoint and can restart independently.
Inngest — each "step" is a durable function boundary. The SDK forces developers to decompose at the function-call level, making checkpoints automatic.

Recall¶

What are the four properties of a well-formed step contract?
Why is a step boundary defined by recoverability, not by conceptual elegance?
In the loan workflow, why is the credit bureau call its own step rather than grouped with risk computation?
What three tests determine correct decomposition granularity?
What failure mode does "phantom coupling" describe?
How does risk-driven decomposition differ from dependency-driven decomposition?

Interview Q&A¶

Q: Why decompose around recovery boundaries rather than around logical concepts?

A: Because the purpose of a step boundary in a durable workflow is to enable checkpointing and selective retry. A conceptually clean grouping that mixes cheap reads with expensive API calls forces full replay on any failure within the group. Recovery-driven boundaries make crash recovery surgical.

Common wrong answer to avoid: "For cleaner architecture." Cleanliness is a side effect. The primary driver is operational: what can you checkpoint, and what can you skip on retry?

Q: When is it correct to group multiple operations into a single step?

A: When all operations within the group have the same risk class, same recovery mode, same executor, and the combined cost of replay is acceptably low. Grouping three sub-millisecond in-memory computations into one step avoids serialisation overhead without sacrificing recoverability.

Common wrong answer to avoid: "When they're conceptually related." Conceptual relatedness is not the criterion — operational recoverability is.

Q: Why should the decomposition happen before agent selection?

A: Because decomposition defines the shape of work (what types of steps exist, what their inputs and outputs are, what risk classes apply). Agent selection then matches executors to step requirements. If you select agents first, you optimise for what agents can do rather than what the workflow needs.

Common wrong answer to avoid: "Because planning agents are smarter." This is not about intelligence ranking — it is about information ordering. You must know the work before you assign workers.

Q: How do you detect that a step is too coarse in production?

A: Watch retry cost and checkpoint-to-checkpoint duration variance. A step that frequently retries and each retry is expensive (because it includes both cheap and expensive sub-operations) is too coarse. Split at the boundary between the cheap prefix and the expensive operation.

Common wrong answer to avoid: "When it takes too long." Duration alone is not diagnostic — a step can be long because the underlying work is genuinely slow, not because it's poorly decomposed.

Design/debug exercise (10 min)¶

Modeled example. The refund request ("charged twice, please refund the duplicate and confirm by email") decomposes into 7 steps. For each, assign: risk class, recovery mode, and whether it can run in parallel with any other step. Steps 1–5 are read-only/replay-safe. Step 6 is irreversible-write/idempotent-retry. Step 7 is side-effect/depends-on-6. Steps 1 and 2 can parallelise (independent reads).

Your turn. Take a real workflow from your system. Decompose it into 5–8 steps. For each step, fill in: typed input, typed output, risk class, recovery mode. Identify which steps can parallelise and which must be sequential.

From memory. Draw the four-property step contract box. Then sketch the loan workflow's step 3 sub-decomposition (3a–3d) and label why the checkpoint after 3a saves money on crash.

Operational memory¶

This chapter established that decomposition is not conceptual organisation — it is durability engineering. Every step boundary must be a point where the workflow can checkpoint, validate output, and potentially resume without re-running prior steps. The four properties of a well-formed step (typed input, typed output, risk class, recovery mode) ensure the dispatch loop can execute, checkpoint, and recover each step independently.

The three decomposition strategies — dependency-driven, risk-driven, and cost-driven — are usually combined in production. Dependency ordering gives the graph shape. Risk isolation gives each side-effect its own node. Cost awareness places checkpoints after expensive operations. The Goldilocks test (retry independence, crash-loss cost, ownership boundary) prevents both over-decomposition (orchestration overhead) and under-decomposition (expensive failures).

Remember:

A step boundary is a recovery boundary. If you can't checkpoint there, it's not a real boundary.
Typed inputs and outputs make steps testable in isolation and handoffs verifiable.
Split at risk-class transitions: read-only work before side-effect work.
Split after expensive operations: never pay twice for a bureau call because a cheap step downstream crashed.
Group only when all grouped operations share risk class, recovery mode, and executor.
Over-decomposition costs latency (serialisation at every boundary). Under-decomposition costs money (full replay on any failure within the block).
The loan workflow's credit bureau call is its own step because it costs $0.50 — a crash after it should never force a re-call.

Bridge. The workflow graph now has steps with typed contracts. But each step needs an executor — which agent, which model, which tool path handles this particular node? The graph has shape; it needs assignment. Next: routing steps to the right capability. → 03-agent-selection-routing.md