01. Single-Agent Overload — when one employee does everything¶

~12 min read. The failure pattern that pushes teams from one agent to many.

Built on the ELI5 in 00-eli5.md. The department — a specialist with a narrow job — exists because one employee doing everything eventually breaks.

1) The picture — one overloaded employee¶

Look. First build the picture before the formula. One employee is acting like the department, the memo format, and the CEO together. That sounds efficient for one minute. Then the work starts colliding. Simple, no?

user request: "Make the article go live"

                 ┌───────────────────────────────┐
                 │      one general agent        │
                 │                               │
                 │  research   writing           │
                 │  review     publishing        │
                 │                               │
                 │  context window               │
                 │  [#####.....] start           │
                 │  [########..] after research  │
                 │  [##########] while writing   │
                 └───────────────────────────────┘
                            │
            ┌───────────────┼───────────────┐
            ▼               ▼               ▼
      raw notes        style rules     publish metadata
            \               |               /
             \              |              /
              └──── all squeezed together ─┘

See. Research wants breadth. Writing wants focus. Review wants criticism. Publishing wants exact fields. These are not the same job. Still, one agent is pretending to be the department for all of them. The same prompt now carries search goals, voice goals, compliance goals, and release goals. Soon the memo format is buried under scraps. Soon the CEO cannot tell what matters first. Now what is the problem? The agent does not just do many steps. It does many incompatible kinds of thinking in one buffer. That is the overload pattern.

2) Where exactly it breaks¶

Context overload¶

Research drags too much raw material into the writing stage. Imagine market research pulling twenty reports and forty quotes. The writing step now receives piles of notes, not a crisp brief. Important facts hide beside irrelevant facts, so the worker rereads instead of deciding. The handoff should have been short. Without the handoff, the next step inherits the whole mess.

Attention dilution¶

One prompt often says, "be creative, be precise, be compliant, be brief." Look. Those instructions compete for the same attention budget. A content agent may try to sound lively while also preserving legal wording. A support agent may try empathy, policy, escalation, and upsell together. The result becomes average everywhere because no instruction is truly primary. Even the CEO sounds confused when everything is urgent.

Tool conflict¶

Different stages want different tool behavior. Research tools reward broad search and loose recall. Publishing tools punish small metadata mistakes. Suppose the same agent has search, CMS publish, and analytics tools. During research, broad retrieval is useful. During publishing, one wrong tag breaks discoverability. So what to do? You do not want the search mindset leaking into release steps. That is why the department idea matters.

Horizon failure¶

Long tasks need stable intermediate state. A single agent often re-derives that state repeatedly. Suppose it researches in turn one, drafts in turn two, and revises in turn three. If the draft summary was weak, turn three rebuilds the plan from scratch. Hours of work become fuzzy memory, and the agent pays again to rediscover structure. The org chart may exist in theory. But without durable the handoff, nobody knows who decided what.

Evaluation fog¶

When the final output is weak, diagnosis becomes blurry. Was retrieval poor? Was drafting weak? Was review shallow? Was publishing metadata incomplete? In one big loop, you only see the bad ending. You cannot isolate the broken stage. That is evaluation fog. With a narrow the memo format, each step leaves inspectable evidence. With one overloaded worker, failure becomes one large mystery.

3) Worked example — content workflow at 60% reliability¶

Now take a concrete pipeline. One agent does research → write → review → publish. Each stage looks decent alone. End-to-end, the picture changes.

Assume the single agent has these success rates.

Research success = 0.92
Writing success = 0.88
Review success = 0.85
Publishing success = 0.90

Now multiply them.

After research and writing: 0.92 × 0.88 = 0.8096
After review: 0.8096 × 0.85 = 0.68816
After publish: 0.68816 × 0.90 = 0.619344

So end-to-end success is 0.619344. That is about 61.9%. Call it roughly 62%. See the danger. Each stage looked "pretty good," but the workflow is still unreliable.

Now compare specialists. Suppose the department uses four focused workers. Researcher success = 0.95. Writer success = 0.95. Reviewer success = 0.95. Publisher success = 0.95.

Multiply again.

After research and writing: 0.95 × 0.95 = 0.9025
After review: 0.9025 × 0.95 = 0.857375
After publish: 0.857375 × 0.95 = 0.81450625

So end-to-end success is 0.81450625. That is about 81.5%. Simple, no? Moving from one overloaded worker to four coherent workers adds almost twenty points. The main gain is not magic intelligence. It is cleaner scope, cleaner the handoff, and cleaner the memo format. That is why the formula comes after the picture. Reliability compounds both ways.

4) Signals for splitting vs staying single¶

Look. Do not split because multi-agent sounds fashionable. Split when the structure demands it. Stay single when coherence is still intact.

When to split¶

Conflicting tools exist, like broad search and strict publishing controls.
Different models are needed, like cheap retrieval and strong review.
Independent evaluation per stage matters, so failures can be isolated.
Natural parallelism exists, like two researchers gathering sources simultaneously.
Context pollution keeps hurting later steps with earlier raw material.
The handoff can be written clearly in the memo format.
The org chart has real roles, not decorative titles.

When to stay single¶

The task is short, bounded, and easy to inspect.
Only a few simple tools are involved.
No intermediate evaluation is needed.
Latency matters more than modularity.
One prompt can still keep priorities coherent.
The CEO can state one clear goal without internal conflict.
The cost of extra the handoff steps would exceed the gain.

So what to do? Use one agent until roles start fighting. Then split at the fight lines. Not before. Not after the outage.

5) The structural reason — separation of concerns¶

This is just software engineering again. One module should not hold unrelated responsibilities. One prompt should not juggle unrelated cognitive modes. That is the same principle. Separation of concerns is the deeper reason behind the department.

Narrow prompts are easier to optimize. Narrow prompts are easier to evaluate. Narrow prompts are easier to replace. If the reviewer prompt is bad, swap only that unit. If the publisher tool schema changes, update only that unit. The whole system does not need retraining.

See. This is not about agent count. It is about coherence per unit. A two-agent system can still be messy. A five-agent system can still be elegant. The test is simple. Does each unit have one stable job, one clean input, and one clean output? If yes, the org chart is doing useful work. If not, you only drew boxes.

This is also why the memo format matters. A specialist is only useful when the next specialist can understand the output. This is also why the handoff matters. A good handoff removes raw clutter and preserves decision-grade facts. And this is why the CEO should assign goals, not micromanage every token. Departments work because interfaces work. Agents work for the same reason.

Where this lives in the wild¶

GitHub Copilot agents — software engineer: file search, code edits, test runs, and debugging compound errors when one loop handles everything.
Support automation platforms — support operations manager: policy rules, CRM lookup, refund logic, and customer messaging compete inside one prompt.
Content generation pipelines — content strategist: research, drafting, SEO checks, and CMS publishing overload one agent quickly.
Enterprise RAG systems — compliance analyst: retrieval, synthesis, citation control, and policy filtering collide inside one context window.
Financial report generators — finance manager: data pull, variance analysis, narrative writing, and compliance review do not share the same optimal prompt.

Pause and recall¶

Why does research-heavy context often damage later writing quality?
What is evaluation fog, and why is it expensive?
In the worked example, how did 0.92, 0.88, 0.85, and 0.90 become about 62%?
What signals tell you to stay single-agent for now?

Interview Q&A¶

Q: Why split a workflow into specialists instead of just giving one stronger model a bigger prompt? A: Bigger prompts increase raw capacity, but they do not remove conflicting goals, tool mismatches, or blurry evaluation boundaries. Common wrong answer to avoid: "Because multi-agent is always smarter" — the real reason is cleaner scope and cleaner debugging.

Q: Why is separation of concerns a better explanation than saying single agents are weak? A: A strong model can still fail when one unit mixes research, judgment, drafting, and release control in one place. Common wrong answer to avoid: "Because LLMs cannot do multi-step work" — they can, but reliability falls when roles are incoherent.

Q: Why evaluate each stage separately instead of only measuring final output quality? A: Stage metrics show where failure starts, so teams can fix retrieval, review, or publishing without guessing. Common wrong answer to avoid: "Because more metrics always help" — useless metrics add noise unless they map to real stages.

Q: Why keep some workflows single-agent even after learning multi-agent design? A: Short, bounded tasks often lose more to coordination overhead than they gain from specialization. Common wrong answer to avoid: "Because multi-agent is too advanced for production" — the real issue is cost-benefit, not fear.

Apply now (5 min)¶

Exercise: Take one workflow you use today. Split it into research, decision, execution, and review. Now mark which step has the noisiest context. Then mark which step needs the strictest tool behavior. That is your first candidate split.

Sketch from memory: Draw the overloaded single worker box. Then draw four specialists with arrows for the handoff between them. Mark where context must compress as work moves from one specialist to the next.

Bridge. The single employee is overwhelmed. The cure is departments — but adding departments without an org chart just spreads the chaos. The most common starting org chart is one CEO directing many specialists. That is orchestrator-worker. → 02-orchestrator-worker.md