02. MCP & Multi-Agent Systems — Narrative Explainer¶
Companion to 03_study_material.md. This file gives you the picture in your head. The study material gives you the compact reference version.
Table of contents¶
- ELI5 — the company story
- Chapter 1: The single-agent failure that pushes teams toward multi-agent
- 1.1 The workflow looks simple on paper
- 1.2 Where the single agent breaks
- 1.3 Why the stakes are real in production
- Chapter 2: MCP — the memo format that standardizes cooperation
- 2.1 What MCP standardizes
- 2.2 Client, server, transport
- 2.3 Tools, resources, prompts
- 2.4 Why protocol beats framework fashion
- Chapter 3: Multi-agent topologies — choosing the org chart
- 3.1 Orchestrator-worker
- 3.2 Pipeline
- 3.3 Debate and critique
- 3.4 Hierarchical
- 3.5 Peer-to-peer
- Chapter 4: Orchestration mechanics — how the CEO actually runs the company
- 4.1 Task decomposition
- 4.2 Result aggregation
- 4.3 Error propagation
- 4.4 Shared state vs message passing
- 4.5 Memory management
- Chapter 5: Production patterns — specialization, cost, evaluation, debugging
- 5.1 Agent specialization
- 5.2 Cost and latency control
- 5.3 How to evaluate a multi-agent system
- 5.4 How to debug one without losing your mind
- 5.5 Retrieval prompts you can actually reuse
- 5.6 Honest admission
- Chapter 6: Recap and next-step readiness
- 6.1 Failure-fix chain
- 6.2 Foundation-gap audit
- 6.3 Interview questions
- 6.4 Production experience
- 6.5 Exercises
- 6.6 Bridge to Module 11
ELI5 — the company story¶
Imagine one busy company inside your computer. It has research, writing, review, and publishing departments.
Each department is good at one kind of work. Each department is not good at every kind of work.
In this story, the department means one agent. In this story, the memo format means the MCP protocol. In this story, the CEO means the orchestrator. In this story, the org chart means the topology. In this story, the handoff means agent-to-agent communication.
Now picture the bad version first. One employee tries doing all company work alone. Research slows them down. Writing distracts them. Review makes them second-guess earlier decisions. Publishing introduces tool mistakes.
So the company hires departments. Research gathers facts. Writing turns facts into prose. Review checks accuracy and tone. Publishing formats and sends the result.
But departments need a shared language. If research writes a note in chaos, review cannot use it. If publishing expects a checklist, writing cannot send paragraphs.
So the company introduces the memo format. Every department must write the same kind of memo. The memo always says task, inputs, constraints, output, and status. Now departments understand each other.
That is what MCP does for AI systems. It standardizes how tools, data, and prompts are described.
The CEO does not personally do every task. The CEO breaks work into parts. The CEO sends each part to the right department. The CEO reads the memos coming back. The CEO makes the final decision.
Sometimes departments speak only through the CEO. Sometimes departments talk directly to each other. That choice is the org chart.
A simple org chart looks like this:
A direct department handoff looks like this: The important thing is not just the departments. The important thing is the memo format between them.Without that format, every handoff becomes a custom integration. With that format, specialists can cooperate cleanly.
That is the heart of this module. Do not memorize only names. Keep the company picture in your head.
Chapter 1: The single-agent failure that pushes teams toward multi-agent¶
1.1 The workflow looks simple on paper¶
Suppose you build one agent for a content workflow. The workflow sounds ordinary. Research the topic. Write the draft. Review the draft. Publish the final version.
On paper, that feels elegant. One prompt. One agent. One loop. One place to debug.
In week 9 terms, this looks attractive. You already know the agent loop concept. You think one loop can simply do more.
Your first version often works on toy tasks. It may even work on short demos. That is why this failure is dangerous. It hides until workload becomes realistic.
1.2 Where the single agent breaks¶
The first break is context overload. Research brings too much raw material. The writing stage inherits all of it. The review stage inherits even more. Soon the context window becomes cluttered.
Then comes attention dilution. The model sees sources, draft paragraphs, style rules, tool schemas, and instructions together. Important details compete for attention. Quality falls long before the context window fully overflows.
Then comes tool conflict. Research tools want broad search and long snippets. Publishing tools want strict metadata and short validated inputs. One giant agent prompt must describe both worlds. That increases ambiguity.
Then comes horizon failure. Long tasks need stable intermediate state. A single agent keeps re-deriving that state. It forgets why an earlier decision happened. It reopens settled choices.
Then comes evaluation fog. If the final output is weak, where exactly did failure start? Was research shallow? Was writing careless? Was review too strict? Did publishing mutate the final copy?
One blob gives one blurry answer. That answer is usually, “Something somewhere went wrong.” That is not production debugging. That is educated guessing.
1.3 A concrete failure story¶
Imagine the task is:
Write a market brief on Indian UPI trends, cite sources, review for factual confidence, then publish into the company CMS. The agent begins well. It retrieves many articles and reports.
Now the bad pattern starts. It drags too many snippets into the next step. It writes a decent draft, but source attribution becomes messy. It revises paragraphs while also deciding citation style. It calls the CMS tool before the reviewer fully signs off.
Sometimes it works. Often it works only 60 percent of the time. That number is believable. That number is also operationally useless.
At 60 percent, users stop trusting the workflow. Operators add manual checks. Latency rises. The team loses the very automation it wanted.
1.4 Why production systems move toward multiple agents¶
Production systems scale by separation of concerns. Databases specialize. Queues specialize. Caches specialize. AI systems follow the same logic.
A specialist agent can carry a narrower prompt. A narrower prompt is easier to optimize. A narrower prompt is easier to evaluate. A narrower prompt is easier to replace later.
This is the first serious reason multi-agent matters. It is not because multiple agents feel futuristic. It matters because decomposition is how systems survive scale.
1.5 But do not swing too far¶
Do not hear a false lesson here. Multi-agent is not automatically better. A bad split can increase cost, latency, and confusion.
The real question is narrower. Where does a single loop stop being coherent? Where does a clean split reduce failure enough to justify coordination cost?
Module 11 will ask the hard proof question. How do you know the split helped? This module builds the mental model first.
1.6 Signals that you should consider splitting¶
- The agent needs mutually conflicting tool descriptions.
- Intermediate outputs deserve separate quality gates.
- Different subtasks need different models or prompts.
- Some subtasks parallelize naturally.
- Debugging requires step-level visibility.
- Context from one stage pollutes the next stage.
- Failures cluster around handoffs that are currently implicit.
1.7 Signals that you should stay single-agent¶
- The task is short, bounded, and mostly one reasoning mode.
- Tool count is small and tools are semantically similar.
- Intermediate outputs do not need independent evaluation.
- End-to-end latency matters more than modular elegance.
- The main problem is prompt quality, not orchestration. This split-vs-keep judgment is a foundation gap. Module 11 silently assumes you can already make it.
Chapter 2: MCP — the memo format that standardizes cooperation¶
2.1 What MCP standardizes¶
MCP stands for Model Context Protocol. Think of it as a standard contract between models and capabilities.
It standardizes how a client discovers capabilities. It standardizes how those capabilities are described. It standardizes how data and prompts are exposed. It standardizes how requests and responses are shaped.
That matters because tool use is otherwise fragile. Every framework invents its own wrapper. Every wrapper creates migration pain. Every migration pain slows real systems.
Protocols matter when ecosystems get bigger. TCP mattered for networks. HTTP mattered for the web. MCP matters for model-to-context integration.
2.2 The simplest mental picture¶
An MCP server exposes capabilities. An MCP client consumes them. The model sits inside or beside the client experience.
ASCII picture:
user
|
v
[MCP client + model]
|
protocol messages
|
v
[MCP server]
/ | \
/ | \
[tool] [resource] [prompt]
2.3 Why a protocol matters more than a framework¶
Frameworks help you build quickly. Protocols help ecosystems interoperate.
A framework can disappear next year. A durable protocol survives tool churn. That is why learning MCP is high-leverage.
If you learn only one framework API, your knowledge is partly rented. If you learn the protocol shape beneath it, your knowledge compounds across clients and servers.
2.4 Server, client, and transport¶
The server owns the capabilities. The client requests and uses them. The transport moves messages between both sides.
Common transports include stdio and network transports. stdio is simple for local development. HTTP or SSE help when you need remote connectivity.
A transport is not the capability itself. It is just the road. You should keep road and business logic conceptually separate.
2.5 MCP primitives¶
The three primitives you should remember first are: - Tools: actions the model can invoke. - Resources: data the model can read. - Prompts: reusable prompt templates or workflows. Some implementations also talk about sampling-related patterns. Do not let that distract you from the core three.
2.6 Tools¶
A tool is an action. It usually has a name, description, and input schema. The schema matters a lot. Models behave better when inputs are explicit.
Bad tool definition:
This is too vague. It invites tool misuse.Better tool definition:
{
"name": "publish_brief",
"description": "Publishes an approved brief to the CMS. Use only after review is complete.",
"inputSchema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"summary": {"type": "string"},
"body_markdown": {"type": "string"},
"review_status": {"type": "string", "enum": ["approved"]}
},
"required": ["title", "summary", "body_markdown", "review_status"]
}
}
2.7 Resources¶
A resource is readable context. Think files, documents, configuration, notes, or database-backed views.
Resources reduce prompt bloat when structured well. Instead of stuffing everything into one system prompt, the client can fetch relevant material as needed.
Example resource list:
Notice the shape. It feels like a clear address space. That predictability helps both humans and clients.2.8 Prompts¶
A prompt primitive captures reusable prompt structure. This is useful when teams repeat a pattern often.
Example:
{
"name": "review_brief",
"description": "Checks a draft for factual support, clarity, and publish readiness.",
"arguments": [
{"name": "draft_markdown", "required": true},
{"name": "citations_json", "required": true}
]
}
2.9 A tiny MCP-flavored server sketch¶
Below is a simple illustrative sketch. The exact SDK API may differ by language version. The architectural idea is what matters.
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("briefing-assistant")
@mcp.tool()
def search_reports(query: str, top_k: int = 5) -> list[dict]:
"""Searches the report index for relevant market reports."""
...
@mcp.tool()
def save_brief_note(title: str, body: str, tags: list[str]) -> dict:
"""Stores a research note for later retrieval and summarization."""
...
@mcp.resource("policy://editorial/citation-rules")
def citation_rules() -> str:
return "Every claim needs a source and confidence label."
@mcp.prompt()
def summarize_findings(topic: str) -> str:
return f"Summarize the top findings on {topic} in five bullets."
2.10 MCP helps single-agent and multi-agent systems¶
Do not wrongly file MCP under only “multi-agent topics.” A single agent also benefits from clean capability interfaces.
Multi-agent systems benefit even more. Different agents can rely on the same server contract. They do not each need custom wrappers.
This is where MCP and multi-agent naturally meet. One standard capability layer can serve many specialists.
2.11 MCP is about interoperability, not magic¶
A bad tool behind MCP remains a bad tool. A vague prompt behind MCP remains vague. A protocol does not replace good system design.
It solves a different problem. It makes capabilities legible and portable. That is already a big win.
2.12 The hidden production advantage¶
When capabilities become standardized, you can swap clients more easily. You can test servers independently. You can reason about permissions more clearly.
This matters in teams. Infra teams like stable boundaries. Application teams like reusable interfaces. Security teams like inspectable contracts.
That is why protocol knowledge looks senior. It shows systems thinking, not only prompting skill.
Chapter 3: Multi-agent topologies — choosing the org chart¶
3.1 Why topology matters¶
Multiple agents alone are not a design. They need a coordination pattern. That pattern is your topology.
Topology answers practical questions. Who assigns work? Who can talk to whom? Who stores shared state? Who decides when the system is done?
3.2 Orchestrator-worker pattern¶
This is the default starting pattern. One orchestrator delegates to specialist workers. Workers usually report back upward.
ASCII diagram:
Use it when: - You need clear central control. - Specialists are independent enough to delegate cleanly. - Final synthesis belongs in one place. Strengths: - Easy to reason about. - Easy to log and audit. - Good place for budgets and safety checks. Weaknesses: - The orchestrator can become a bottleneck. - Too much central logic can recreate one giant agent. Very common production example: A support supervisor routes billing, refund, and technical cases.3.3 Pipeline pattern¶
A pipeline sends outputs stage by stage. Each agent transforms work and passes it forward.
ASCII diagram:
Use it when: - The task has a stable order. - Each stage has clear input and output shapes. - Backtracking is rare or cheap. Strengths: - Very understandable. - Good for checkpoints and approval gates. - Easy to benchmark stage-level quality. Weaknesses: - Sequential latency can become painful. - Early mistakes propagate downstream. - Rigidity hurts when workflows branch often. A pipeline loves good memo formats. Bad handoffs ruin pipelines fast.3.4 Debate or critique pattern¶
Here, multiple agents produce or critique candidate answers. A judge or mediator then decides.
ASCII diagram:
Use it when: - You want error checking through disagreement. - Quality matters more than minimal latency. - There is a meaningful judging rubric. Strengths: - Can catch reasoning mistakes. - Good for high-stakes drafting and review. - Encourages explicit justification. Weaknesses: - Cost rises quickly. - Weak judges create expensive noise. - Agents can converge on the same wrong answer. This pattern connects directly to Module 11. You need evaluation discipline to trust it.3.5 Hierarchical pattern¶
A hierarchy creates managers under managers. This helps when one orchestrator is too overloaded.
ASCII diagram:
[Chief Orchestrator]
/ \
/ \
[Research Manager] [Delivery Manager]
/ \\ / \\
/ \\ / \\
[Search] [Fact Check] [Writer] [Publisher]
3.6 Peer-to-peer pattern¶
Peer-to-peer systems allow agents to talk directly. No single boss mediates every exchange.
ASCII diagram:
Use it when: - Specialists truly need iterative collaboration. - The system benefits from local negotiation. - Central orchestration would become too constraining. Strengths: - Flexible local cooperation. - Good for discovery and refinement tasks. - Can reduce central bottlenecks. Weaknesses: - Harder to trace. - Harder to bound loops. - Shared assumptions can drift silently. Peer-to-peer sounds powerful. It also demands stronger protocol discipline. Without crisp handoffs, chaos grows quickly.3.7 Which topology fits which problem¶
| Problem shape | Recommended topology | Why |
|---|---|---|
| One boss, many specialists | Orchestrator-worker | Clear central control |
| Stable stage order | Pipeline | Predictable handoffs |
| Need disagreement before answer | Debate / critique | Error detection through comparison |
| Large program of work | Hierarchical | Scales coordination |
| Local specialist negotiation | Peer-to-peer | Flexible collaboration |
| ### 3.8 Start with the smallest viable org chart | ||
| This is an engineering rule worth memorizing. Start with the smallest topology that fits the error pattern. |
Do not start hierarchical because it feels sophisticated. Do not start debate because it sounds intelligent. Do not start peer-to-peer because it resembles human teams.
Start simple. Measure. Then add structure only where failure demands it.
3.9 A practical decision ladder¶
Ask these questions in order: 1. Can one bounded agent do this reliably? 2. If not, is the task naturally stage-based? 3. If not, is there one obvious coordinator? 4. If not, do specialists need critique or negotiation? 5. If scale grows, do sub-managers become necessary? That ladder prevents premature complexity.
Chapter 4: Orchestration mechanics — how the CEO actually runs the company¶
4.1 Task decomposition¶
Decomposition is not “split randomly.” Good decomposition respects interfaces, goals, and evaluation points.
A useful decomposition recipe is: 1. Define the final output clearly. 2. Find the major failure modes. 3. Split along those failure boundaries. 4. Give each subtask a crisp done condition. 5. Keep the handoff payload minimal but sufficient. Bad decomposition creates chatty agents with vague jobs. Good decomposition creates specialists with testable outputs.
4.2 Example decomposition¶
For a research-to-publish workflow, you might define: - Research agent: returns claims, sources, and confidence. - Writer agent: returns draft sections linked to claims. - Reviewer agent: returns issues, fixes, and publish decision. - Publisher agent: posts only approved material. Notice the pattern. Each output is shaped for the next step. That is intentional orchestration.
4.3 Result aggregation¶
Once workers finish, somebody must combine results. Sometimes that is simple concatenation. Sometimes it is ranking, synthesis, or conflict resolution.
Aggregation questions include: - What if workers disagree? - What if one worker returns nothing useful? - What if outputs overlap? - What if the final context budget cannot fit all results? An orchestrator needs aggregation rules before runtime. Otherwise it improvises poorly under pressure.
4.4 One good aggregation pattern¶
Ask each worker to return four fields:
- answer
- evidence
- uncertainty
- recommended_next_action
Now the orchestrator can compare outputs structurally. It is not forced to interpret free-form prose blindly.
4.5 Error propagation¶
Errors should not vanish into vague text. They should propagate in typed or at least explicit form.
Useful error categories include:
- retryable_tool_error
- validation_error
- missing_dependency
- insufficient_confidence
- human_review_required
Why does this matter? Because a downstream agent should respond differently to each case.
A retryable network error may deserve a retry. A validation error may deserve prompt correction. Insufficient confidence may deserve escalation.
4.6 Shared state vs message passing¶
This is one of the most important architecture choices.
Shared state means agents read and write a common store. Message passing means agents exchange explicit payloads.
Shared state advantages: - Easy global visibility. - Good for dashboards and checkpoints. - Convenient for long-running workflows. Shared state risks: - Hidden coupling. - Race conditions or stale reads. - Harder reasoning about who changed what. Message passing advantages: - Cleaner interfaces. - Easier replay and audit. - Better discipline for handoff design. Message passing risks: - More serialization overhead. - Potential duplication of context. - Developers may overstuff messages instead of summarizing. In practice, many systems mix both. Use shared state for durable workflow records. Use message passing for precise handoffs.
4.7 Memory management¶
Memory in multi-agent systems is not one thing. It has at least three layers. - Working memory for the current step. - Workflow memory for current run state. - Long-term memory for persistent facts or episodes. If every agent reads the full history every time, latency and token cost explode.
So memory management becomes summarization management. You are deciding what survives each handoff.
4.8 A practical handoff template¶
A very reusable handoff memo looks like this:
task: Draft the executive summary
goal: Convert validated claims into concise prose
inputs:
- validated_claims.json
constraints:
- Use only approved claims
- Maximum 180 words
done_definition:
- Exactly 3 paragraphs
- Every claim traceable to a citation
open_risks:
- Claim 4 confidence is medium
budget:
tokens: 3000
latency_ms: 5000
4.9 The agent loop still exists¶
Do not think multi-agent replaces the agent loop concept. It composes multiple loops.
Each worker may still do Reason, Act, Observe. The orchestrator runs a higher-order coordination loop.
That is a hidden assumption before Module 11. If the agent loop concept is fuzzy, multi-agent evaluation will also feel fuzzy.
4.10 Budgeting and stopping conditions¶
A multi-agent system needs explicit limits. Otherwise loops and retries multiply silently.
Useful limits include: - Max turns per agent - Max total tool calls - Max orchestration depth - Max end-to-end latency - Max spend per workflow Stopping conditions should be part of design. Not an afterthought.
4.11 Retry strategy¶
Retries are not free. They can repeat bad reasoning and increase cost.
Good retry design asks: - What type of error happened? - Should the same agent retry? - Should another specialist inspect first? - Should a human be asked instead? Mechanical retries are for flaky infrastructure. Cognitive retries need changed context or changed instructions.
4.12 Observability hooks¶
Every handoff should be traceable. At minimum, log: - agent name - task id - input summary - output summary - tool calls - latency - token usage - error type This sounds operational. It is also conceptual. If you cannot observe the handoff, you cannot truly understand the topology.
Chapter 5: Production patterns — specialization, cost, evaluation, debugging¶
5.1 Agent specialization¶
Specialization is the main production reason for multiple agents. Give each agent a narrow charter. Give each agent tools that match that charter.
Good specialist prompts often answer: - What exact role do I play? - What inputs am I allowed to trust? - What tools may I call? - What output format must I produce? - When should I refuse or escalate? A specialist should not sound like a superhero. A specialist should sound like a disciplined function.
5.2 Cost control¶
Multi-agent systems create token multiplication. Every handoff is more context. Every worker is more inference. Every retry is more spend.
So cost control is not optional. It is part of the design brief.
Common cost patterns: - Cheap model for routing. - Stronger model only for hard synthesis. - Summaries instead of raw transcripts at handoff. - Parallel calls only when latency benefit is real. - Cache stable context like policies and system rules. A strong production instinct is this: Spend expensive tokens only where they change quality materially.
5.3 Cheap router, expensive generator¶
This is a very common pattern. A smaller model classifies the task or routes the work. A larger model handles generation only when needed.
Why it works: - Routing is often simpler than generation. - Misusing a large model for trivial routing wastes money. - Specialized workers can stay on smaller models if tasks are narrow. But measure this carefully. A fancy router that saves nothing is just extra latency.
5.4 Latency control¶
Latency grows from sequence depth. If four agents wait on each other, users will feel it.
Ways to reduce latency: - Parallelize independent workers. - Pre-fetch likely resources. - Use smaller prompts for intermediate steps. - Stop early when confidence is already sufficient. - Keep human approval only at meaningful gates. There is no free lunch. Parallelism can lower latency but raise cost. This cost-latency tradeoff is a foundation gap for Module 11.
5.5 Evaluating a multi-agent system¶
You do not evaluate only the final answer. You evaluate the coordination pattern too.
Useful evaluation layers: 1. Per-agent output quality 2. Handoff quality 3. End-to-end task success 4. Cost and latency budgets 5. Failure recovery behavior For example: - Does the research agent return grounded claims? - Does the writer preserve evidence correctly? - Does the reviewer catch unsupported statements? - Does the whole system finish within budget? Notice how this differs from simple prompt evaluation. You are evaluating an interacting system.
5.6 Debugging a multi-agent system¶
The debugging move is almost always the same. Reduce the blur.
Make the failing handoff explicit. Inspect one step at a time. Look at structured outputs, not only polished prose.
A practical debugging checklist: - Reproduce the same task with tracing enabled. - Identify the first bad intermediate artifact. - Check whether the artifact is under-specified or overlong. - Check whether tool use matched the task charter. - Check whether the next agent misread the handoff. - Check whether the orchestrator used a weak aggregation rule. Debugging is easier when outputs are shaped. Debugging is miserable when every step returns essays.
5.7 Safety and approval patterns¶
Some tasks should never fully automate. Publishing externally is one example. Payments and deletions are obvious others.
In such cases, design a human approval gate. Do not hide it behind prompt wording alone. Make it architectural.
5.8 Retrieval prompts you can actually reuse¶
These prompts help agents retrieve or compress context well. Copy them and adapt.
Prompt 1 — split or stay single
You are an orchestration analyst.
Given the task, decide whether one agent is enough
or whether the task should split into specialists.
Return: recommended topology, reasons, expected cost impact, expected latency impact.
Summarize this worker output for the next agent.
Keep only facts, open risks, constraints, and required next action.
Do not include raw chain-of-thought or irrelevant detail.
Retrieve the smallest set of documents needed to support the task.
For each document, return claim relevance, source type, and confidence.
Prefer fewer high-signal documents over many weak ones.
Two agents disagree.
Summarize the disagreement as: contested claim, evidence from side A,
evidence from side B, decision rule needed, and what extra data would resolve it.
5.9 Honest admission¶
Now the important adult sentence. Multi-agent adds complexity. Sometimes a single agent is the right answer.
If a task is short, if the tools are simple, if the reasoning mode is unified, and if the error rate is already acceptable, then splitting may be foolish.
You pay for coordination in tokens, latency, tracing, and maintenance. You also pay in human comprehension. A junior team can drown in orchestration complexity fast.
So remember the mature rule: single agent first, multi-agent only when the failure pattern earns it.
This honest admission is interview gold. It shows you are not intoxicated by architecture theatre.
5.10 The four things Module 11 assumes from here¶
By next week, you should already understand: - The agent loop concept - Multi-agent coordination basics - When to split versus keep single - Cost and latency tradeoffs If these four feel shaky, Module 11 will feel much harder than it should.
Chapter 6: Recap and next-step readiness¶
6.1 Failure-fix chain¶
| Failure | Symptom | Fix pattern |
|---|---|---|
| Single prompt does too much | Quality degrades on long tasks | Split by specialist role |
| Context window bloats | Model misses important details | Summarize and stage handoffs |
| Tools conflict semantically | Wrong tool chosen | Narrow tool definitions and charters |
| Early mistakes poison later stages | Final answer looks incoherent | Add review stage or gated pipeline |
| One orchestrator becomes overloaded | Slow and messy coordination | Add sub-managers or hierarchy |
| Disagreement is hidden | False confidence in wrong answer | Use critique or debate pattern |
| Retries become random | Cost rises without recovery | Propagate explicit error types |
| Logs are vague | Debugging takes forever | Trace each handoff structurally |
| Every agent gets all history | Cost and latency explode | Manage memory per stage |
| Architecture feels smart but not useful | Complexity without benefit | Prefer single agent until measured need |
| Read that table twice. It is the compressed mental model of the whole module. | ||
| ### 6.2 Key points to remember | ||
| 1. Multi-agent is a scaling strategy, not a fashion statement. | ||
| 2. MCP is the interface layer, not the orchestration layer. | ||
| 3. Topology is the org chart of your system. | ||
| 4. Handoffs deserve as much design as prompts. | ||
| 5. Summarization is memory management in disguise. | ||
| 6. Cost and latency compound across agents. | ||
| 7. Evaluation must cover both outputs and coordination. | ||
| 8. Single agent often remains the correct choice. | ||
| ### 6.3 Foundation-gap audit | ||
| This section is important because Module 11 quietly assumes these foundations. | ||
| Assumed foundation | If weak, what breaks next week? | How to patch it now |
| --- | --- | --- |
| Agent loop concept | You cannot localize failures in a workflow | Revisit Week 9 ReAct and trace one loop manually |
| Multi-agent coordination | You confuse more agents with better design | Draw topologies and explain information flow aloud |
| Split vs keep single | You over-engineer or under-split | Use the decision ladder from Chapter 3 |
| Cost and latency tradeoffs | You cannot judge production viability | Estimate tokens and sequence depth for one workflow |
| If these feel comfortable, you are properly prepared for evaluation frameworks. | ||
| ### 6.4 Important interview questions | ||
| 1. What problem does MCP solve that a framework alone does not? | ||
| 2. When is orchestrator-worker better than pipeline? | ||
| 3. When would you reject a multi-agent design and keep one agent? | ||
| 4. How would you control cost in a multi-agent production system? | ||
| 5. Shared state versus message passing — when would you choose each? | ||
| 6. How do you debug the first failing handoff? | ||
| 7. Why can debate patterns fail even with multiple strong agents? | ||
| 8. What assumptions must a handoff memo include? | ||
| If you can answer these crisply, your understanding is no longer only tutorial-deep. | ||
| ### 6.5 Production experience to internalize | ||
| - Real wins come from specialization plus discipline, not agent count alone. | ||
| - Teams underestimate trace design until failures arrive. | ||
| - Cheap routing saves money only when routing itself stays cheap. | ||
| - Long workflows live or die on handoff summaries. | ||
| - MCP helps teams reuse capability layers across clients. | ||
| - Evaluation pressure increases with each new coordination edge. | ||
| - Human approvals remain necessary for high-stakes actions. | ||
| These are not theory-only lessons. They are the texture of operating such systems. | ||
| ### 6.6 Apply now — exercises | ||
| #### Exercise 1 | ||
| Take one Week 9 agent you built earlier. Write down the exact point where it feels overloaded. Decide whether the overload is prompt, tool, or topology related. | ||
| #### Exercise 2 | ||
| Design two versions of the same workflow. One should be single-agent. One should be orchestrator-worker. Estimate cost, latency, and likely failure modes for each. | ||
| #### Exercise 3 | ||
| Create a handoff schema for research to writing. Keep it under ten fields. Make every field earn its place. | ||
| #### Exercise 4 | ||
| Take a vague tool from an older project. Rewrite it as a crisp MCP-style capability definition. Add constraints that prevent misuse. | ||
| #### Exercise 5 | ||
| Choose one topology from Chapter 3. Defend why it is wrong for a different task shape. Good engineers must recognize misfit, not only fit. | ||
| ### 6.7 Bridge to Module 11 | ||
| Next module — 00_ai_evals_release_gates — answers the hard question: how do you know any of this actually works? Evaluation frameworks for LLMs, agents, and RAG in production. |
That is the right next question. Once you have orchestration, you need proof.
6.8 Final memory hook¶
Keep three pictures in your head: - the company with departments - the memo format between them - the CEO choosing the org chart If those pictures stay stable, the vocabulary of MCP and multi-agent will stay stable too.
Appendix — quick topology snapshots¶
Snapshot 1 — router plus workers¶
A router decides which specialist should handle the case. This differs from a full orchestrator. The router chooses once, then steps back.
Use this when classification is the real problem. Do not use it when many specialists must collaborate deeply.Snapshot 2 — map-reduce style research¶
One orchestrator fans out search tasks in parallel. Then one reducer composes the final answer.
This pattern helps when sub-searches are independent. It hurts when the reducer gets overwhelmed by noisy evidence.Snapshot 3 — reviewer as gatekeeper¶
A reviewer can be placed after generation but before action. This is common for publish, refund, and delete actions.
Think of the reviewer as a quality and safety checkpoint. This is often the highest-leverage extra agent.Snapshot 4 — peer review ring¶
Three agents can also review each other cyclically. This is useful only when bounded tightly. Otherwise the ring becomes a loop trap.
The ring must have a stop condition. Without one, cost drifts upward quietly.Snapshot 5 — human in the middle¶
Sometimes the most mature topology includes a human gate. That is not a weakness. That is production judgment.
When stakes are high, architecture should show restraint. Reliability often beats autonomy theatre.