06. Task Decomposition — splitting work along failure boundaries¶

~10 min read. Not "split randomly." Split where failures cluster. Each piece must have a testable done condition.

Built on the ELI5 in 00-eli5.md. The CEO — the orchestrator — must decide how to divide work among departments. Bad splits create chaos. Good splits create clarity.

1) The decomposition recipe¶

Picture first. Imagine a school worksheet touched by three confused children. One gathers facts, one checks facts, one formats facts. When it is wrong, nobody owns the fix. See.

Decomposition is choosing clean control points. The CEO should split where mistakes cluster, not where labels look pretty. Good splits make the department narrow and accountable. Simple, no?

Use this five-step recipe.

Define the final output clearly: format, scope, audience, quality bar. If the finish line is vague, every subtask drifts.
Find the major failure modes: wrong facts, weak coverage, missed edge cases, bad formatting, slow review. Write these down before creating agents.
Split along those failure boundaries. Choose places where different checks, tools, or expertise are required.
Give each subtask a crisp done condition. A subtask is done only when an explicit acceptance test passes.
Keep the handoff payload minimal but sufficient. Pass only what the next agent needs, not the whole chat or hidden assumptions. That is the handoff.

Bad decomposition creates chatty agents with vague jobs. Good decomposition creates specialists with testable outputs.

BAD SPLIT
researcher ─┬─ finds facts
            ├─ verifies some facts
writer ─────┼─ rewrites facts
reviewer ───┴─ rechecks facts and formatting

GOOD SPLIT
researcher  -> source pack
verifier    -> verified claims
writer      -> source-linked draft
formatter   -> final markdown

In the bad split, ownership overlaps. In the good split, failure surfaces separate cleanly.

2) Split by failure mode, not by function¶

Now what is the common mistake? People split by job title: research, writing, review, publishing. Nice boxes. Wrong basis.

Better question: where do different kinds of failures happen? That is the boundary that matters. See.

Take a content workflow. Many teams say, "One agent researches, one agent writes." Looks sensible. But research may contain two very different failure modes: finding enough facts and verifying whether those facts are true. Both are called research, but they fail differently. So they should often split differently.

Example one: a market brief agent finds ten blog posts quickly. Search succeeded, but none are primary sources. Verification still fails. So the real boundary is finding facts versus verifying facts.

Example two: a coding workflow searches the right files but edits the wrong function. File discovery succeeded; change selection failed. Again, split by failure mode.

Example three: a support workflow classifies a ticket correctly but the resolver receives no account history. Classification is fine; handoff quality is broken. So split routing from payload preparation.

Function labels feel neat. Failure labels feel operational. Choose operational. That is how the CEO protects quality.

3) Done conditions — the overlooked requirement¶

Most weak multi-agent systems fail here. They create agents and assign roles, but never define what finished means. Then the agents keep chatting, stop too early, or return fluent garbage. See.

Every subtask needs a testable done condition. Not, "research is done when the agent stops." Better: research is done when at least 5 sources are found, each has confidence at or above 0.7, and all 3 required subtopics are covered. Now the output can be checked. Simple, no?

Here are concrete bad versus good pairs. Bad: "Write the draft." Good: "Return 3 paragraphs, each citing at least one source, total 150-200 words." Bad: "Review the draft." Good: "Return pass/fail with a list of unsupported claims and suggested fixes." Bad: "Find sources." Good: "Return 6 sources, at least 2 primary, with title, URL, date, and relevance score." Bad: "Check compliance." Good: "Return pass/fail for all 8 policy rules, with violated rule IDs and notes."

Notice what improved. The good version names count, format, quality threshold, and failure output. That means the department knows when to stop, and the CEO knows when to reject.

A strong done condition usually answers five questions. What must be present? How much is enough? What quality bar must be met? What output shape is required? What should happen if the bar is missed? If one answer is missing, the task may drift.

4) Worked example — decomposing a market brief workflow¶

Task: "Generate a market analysis brief on Indian fintech."

Step 1: define the final output. Final deliverable = a 500-word brief with 5 or more citations, an executive summary, key trends, and a risk section. It should be readable by a product manager. Now the finish line is visible.

Step 2: list the failure modes. Shallow research, unsupported claims, missed trends, formatting errors. These are different failures, so they deserve different checks.

Step 3: choose the split. Research agent, Writer agent, Fact-checker agent, Formatter agent. Notice the logic. This is not split by beauty. It is split by where quality can break.

Step 4: define done conditions. Research done condition: return 8 or more sources, each with relevance score at or above 0.6, plus a note on why it matters. Coverage must include payments, lending, and regulation. Writer done condition: return 500 words, plus or minus 50, with executive summary, key trends, and risk section. Every claim must link to a source ID like S1 or S4, and no unsupported claim should enter the draft. Fact-checker done condition: check every claim against its cited source, label it verified, weakly supported, or unsupported, add confidence, and flag anything below 0.75. Formatter done condition: return markdown that includes all required sections, clean citations, and the target word count.

Make it concrete. Suppose research returns the following source pack: S1 RBI annual report 0.92, S2 NPCI UPI data release 0.89, S3 BCG fintech note 0.72, S4 World Bank inclusion report 0.68, S5 Economic Times market article 0.61, S6 Bain digital payments report 0.74, S7 SEBI circular summary 0.66, S8 industry funding database extract 0.63. That is enough coverage to move forward. See.

Step 5: prepare handoff payloads. Research should pass source IDs, titles, URLs, dates, scores, and notes, not raw browsing history. Writer should pass claim-to-source links, not hidden assumptions. Fact-checker should pass a claim verdict list. Formatter should receive only the approved draft and citation set. This is just a preview of the handoff. The next file goes deeper.

5) Common decomposition anti-patterns¶

Too many agents: coordination overhead becomes bigger than the quality gain. Too few agents: one worker carries too much context again. Overlapping charters: two agents both edit facts or rewrite style, so blame is shared and duplication grows. Missing done conditions: agents never truly finish and keep hedging or redoing old work. Splitting for architecture beauty: a diagram can look elegant and still reduce nothing. If the extra agent does not lower a distinct failure rate, do not keep it.

The rule is simple. Add a split only when it reduces a distinct failure. That is mature decomposition.

Where this lives in the wild¶

GitHub Copilot coding tasks — decomposed into file search, code generation, and test running, each with clear success criteria for the engineer.
Salesforce Einstein case workflows — customer case routing split by failure mode: classification errors, resolution errors, and handoff errors for support teams.
Legal AI platforms like Harvey — contract review decomposed into clause extraction, risk scoring, and recommendation, each independently testable by legal ops.
Healthcare AI diagnosis support — workflows split into symptom collection, differential diagnosis, and test recommendation, each with explicit confidence thresholds for clinicians.
E-commerce listing systems like Amazon seller tools — description generation, SEO optimization, compliance check, and image selection, each behind measurable quality gates for catalog teams.

Pause and recall¶

Why is failure mode a better split signal than job title?
What are the five steps in the decomposition recipe?
Why does every subtask need a testable done condition?
In the fintech example, why was fact-checking separated from writing?

Interview Q&A¶

Q: Why should a senior engineer split by failure mode instead of by function chart? A: Because functions can hide multiple failure types inside one box. Failure-mode splits create cleaner controls, clearer ownership, and better evaluation. Common wrong answer to avoid: "Functions are enough because teams already understand those labels" — familiar labels do not guarantee operational clarity.

Q: Why not let one strong agent research and verify together? A: Because finding facts and verifying facts often fail differently. Combining them can hide weak sourcing behind fluent output. Common wrong answer to avoid: "A stronger model makes decomposition unnecessary" — model strength does not remove the need for explicit checks.

Q: Why are done conditions more important than role names? A: Because role names describe intention, but done conditions describe acceptance. Systems improve when outputs are testable, not merely well named. Common wrong answer to avoid: "If the prompt is detailed, the agent will know when it is done" — verbosity is not the same as a measurable finish line.

Q: Why can extra agents reduce quality instead of improving it? A: Because each extra split adds coordination cost, handoff risk, and latency. If the split does not isolate a real failure, it adds noise. Common wrong answer to avoid: "More specialists always mean better quality" — specialization helps only when boundaries are clean and useful.

Apply now (5 min)¶

Exercise: Take one workflow you use often. Maybe bug triage, research writing, or customer support. Write the final output in one sentence. List four major failure modes. Then split the workflow only where those failures differ. For each subtask, write one done condition with a number or threshold.

Sketch from memory: Draw one bad split with overlapping agents. Draw one good split with clean boundaries. Then label what the CEO checks and what the handoff contains.

Bridge. The split is clean. Each department knows its job. But what exactly passes between them? A vague message creates confusion. A structured handoff creates clarity. Next: how to design what one agent gives to the next. → 07-handoff-design.md