Skip to content

03. Agent selection and routing — matching executors to step requirements

~18 min read. The workflow graph has steps. Each step needs an executor — a model, an agent, a tool, or a human. Routing is the policy that assigns executors based on step requirements, risk class, and cost constraints. Get it wrong and premium models waste money on trivial reads while cheap models fumble critical decisions.

Built on the first-principles overview in 00-first-principles.md. The dispatch loop — read state → pick next step → execute → write result — needs a routing policy to pick the right executor for each node. Coordination cost is the pressure: every routing decision adds overhead, but wrong routing wastes far more.


What file 02 established and what remains

File 02 gave each step a typed contract: input, output, risk class, recovery mode. The workflow graph has shape and boundaries. But a step contract does not name who executes the step. A risk-scoring step could run on a $0.002/call classifier, a $0.06/call mid-tier model, or a $0.15/call frontier model. A document-parsing step could use an LLM, an OCR pipeline, or a deterministic parser. The dispatch loop needs a routing policy that maps step requirements to the cheapest executor that clears the quality bar.


The team that routed everything to the frontier model

A B2B SaaS company ships a customer-onboarding workflow. Seven steps: parse uploaded documents, validate fields, check compliance, score risk, generate welcome email, create account records, notify the sales team. Every step runs on the same frontier model. Cost per onboarding: $0.47. Latency p50: 34 seconds. Monthly volume: 8,000 onboardings. Monthly bill: $3,760.

A second engineer audits step by step:

Step Actual requirement Routed to Better executor Savings
Parse documents Structured extraction Frontier ($0.08) OCR + small model ($0.01) 87%
Validate fields Schema check Frontier ($0.06) Deterministic validator ($0.00) 100%
Check compliance Policy lookup Frontier ($0.08) RAG + small model ($0.02) 75%
Score risk Complex reasoning Frontier ($0.08) Frontier ($0.08) 0%
Generate email Template fill Frontier ($0.06) Small model ($0.01) 83%
Create records API call Frontier ($0.06) Deterministic tool ($0.00) 100%
Notify sales Template message Frontier ($0.05) Deterministic tool ($0.00) 100%

After re-routing: cost per onboarding drops to $0.12. Monthly bill: $960 — a 74% reduction with no quality loss on any step. The frontier model was needed for exactly one step (risk scoring). The other six were over-served.

Teacher voice. Routing is an economic decision, not a quality decision. The question is never "what is the best model?" It is "what is the cheapest executor that clears this step's quality bar?" Over-routing to premium models is the most common invisible cost in agent workflows.


The invariant: route by step requirement, not by default

The chapter protects one rule: each step is routed to the cheapest executor that satisfies its typed contract and quality bar.

A routing policy that sends all steps to one executor violates this invariant. A routing policy that picks executors by step name rather than step requirements will drift as step content changes. The dispatch loop must evaluate: what does this step need (context length, reasoning depth, tool access, determinism) and which executor provides exactly that at minimum cost?


Routing dimensions — what the dispatch loop evaluates

Four dimensions determine the correct executor for a step:

┌─────────────────────────────────────────────────────┐
│  ROUTING DECISION                                   │
├─────────────────────────────────────────────────────┤
│  1. Reasoning depth  — does this step need          │
│     multi-step inference or just pattern matching?  │
│  2. Context requirement — how much input state      │
│     must the executor see?                          │
│  3. Determinism requirement — must the output be    │
│     exactly reproducible on retry?                  │
│  4. Risk class — what blast radius does a wrong     │
│     output have?                                    │
└─────────────────────────────────────────────────────┘
Dimension Low → cheap executor High → premium executor
Reasoning depth Schema validation, template fill Multi-factor risk assessment, synthesis
Context requirement Small structured input 50-page document + conversation history
Determinism Approximate is fine Must be reproducible for audit
Risk class Read-only, reversible Irreversible-write, financial

When all four dimensions are low, use a deterministic tool or a small model. When reasoning depth and risk are both high, use the strongest available model with an approval gate. The middle ground — moderate reasoning, moderate risk — is where cost-optimised routing pays the largest dividend.


Routing strategies

Static routing. Each step type maps to a fixed executor. Simple. Predictable. No runtime overhead. Works well when step types are stable and volume is low enough that over-routing doesn't matter.

Rule-based routing. Conditions evaluate step metadata at dispatch time. "If risk_class == 'irreversible-write', use frontier model + human gate. If input_tokens < 500 and risk_class == 'read-only', use small model." This is the production default for most teams.

Learned routing. A classifier predicts the best executor from step features. Trains on historical success/failure/cost data. Useful at scale (>100k steps/day) where marginal cost improvement justifies the classifier's operational complexity.

Fallback routing. Primary executor fails → fall back to a stronger model or a different path. The durable checkpoint ensures the step restarts cleanly on the fallback executor without replaying prior steps.

step arrives at dispatch loop
  evaluate routing dimensions
       ├── static map? ────────────→ executor A
       ├── rule match? ────────────→ executor B
       ├── classifier score? ──────→ executor C
       └── all fail? ─────────────→ fallback executor D

Threaded example — loan workflow routing

From file 02, the loan-approval workflow has steps with typed contracts. Now assign executors:

Step Requirement Routed to Why
1. Parse docs Structured extraction from PDF OCR + extraction model Deterministic pipeline is cheaper and more reliable than LLM for structured docs
2. Check eligibility Rule evaluation against criteria Deterministic rules engine No reasoning needed — pure conditional logic
3a. Credit bureau data External API call Tool wrapper (no model) Zero reasoning — just an HTTP call with auth
3b. Internal history Database query Tool wrapper (no model) Same — pure data retrieval
3c. Compute risk score Multi-factor reasoning Frontier model This is the one step that genuinely needs strong reasoning
3d. Validate against policy Policy rule check Small model + RAG Policy lookup with bounded reasoning
4. Human review Approval gate Human (async) Required by compliance — no model can substitute
5. Send offer Template generation Small model Low reasoning, template-shaped output

Result: only one step (3c) uses the expensive model. Total cost drops dramatically. Quality is unchanged because each step gets exactly the capability it needs.


State-aware routing — when the workflow's history changes the decision

Static and rule-based routing evaluate step metadata before execution. But sometimes the routing decision depends on what previous steps discovered. This is state-aware routing — the dispatch loop reads shared workflow state and adjusts the executor.

Example in the loan workflow: step 3d (policy validation) normally routes to a small model. But if step 3c produced a borderline risk score (within 5% of the threshold), the routing policy escalates 3d to the frontier model for more careful reasoning. The shared state (risk score value) changes the routing decision dynamically.

shared state after step 3c:
  risk_score: 72  (threshold: 70)

routing rule for step 3d:
  if abs(risk_score - threshold) < 5:
    route to frontier model + flag for extra scrutiny
  else:
    route to small model (standard path)

This is where the handoff contract matters most — step 3c's output must include the risk score in a typed field that the routing policy can read. If the output is unstructured text, state-aware routing is impossible without parsing.


Failure modes in routing

Capability mismatch. A step requiring long-context reasoning (30k tokens of document evidence) is routed to a model with a 4k context window. The executor truncates input silently. Output quality collapses without any error signal.

Cost mismatch. Every step routes to the premium tier because the team never profiled which steps actually need it. Monthly bill is 4× what it should be. No quality improvement on the over-served steps.

Feedback blindness. The routing policy never updates. A model that was adequate six months ago now fails 15% of routing decisions because step complexity drifted. No mechanism retrains or re-evaluates the policy.

Fallback storms. The primary executor degrades under load. Every step falls back to the secondary. The secondary (a larger model) costs 5× more. The monthly bill spikes. The team notices days later from the invoice.


Boundary of applicability

Strong fit: workflows with heterogeneous steps (mix of retrieval, reasoning, side-effects, human gates) where different executors have clearly different cost-quality profiles. Routing pays for itself immediately.

Marginal fit: workflows where every step requires the same strong reasoning and context length. Routing adds dispatch overhead without meaningful cost savings. Use a single model.

Pathology: learned routing without enough training data. The classifier makes worse decisions than a simple rule. Operational complexity increases for no gain. Start with rules; add learned routing only at scale.


Where this lives in the wild

  • OpenAI Agents SDK — handoffs — the handoff mechanism explicitly transfers control between specialised agents, each with different tool sets and instructions. Routing is the handoff decision.
  • LangGraph — conditional edges — edge functions evaluate state and route to different nodes (executors) based on runtime conditions. This is state-aware routing in code.
  • Anthropic multi-model architectures — Haiku for classification/routing, Sonnet for moderate tasks, Opus for complex reasoning. The router is a tiny Haiku call that dispatches to the appropriate tier.
  • Temporal activity routing — different activities can run on different task queues with different worker pools (different models, different hardware, different cost profiles).
  • Vercel AI SDK — model selection — applications choose models per request based on complexity signals, routing simple queries to small models and complex ones to frontier models.

Recall

  1. What four dimensions determine the correct executor for a step?
  2. In the onboarding example, why did only one of seven steps actually need the frontier model?
  3. What is state-aware routing and when does it trigger?
  4. What failure mode does "fallback storm" describe?
  5. When is routing overhead not worth the complexity?
  6. How does the handoff contract between steps enable state-aware routing?

Interview Q&A

Q: Why route by step requirements rather than by task importance?

A: Because "importance" is a property of the overall workflow, not of individual steps. A high-importance workflow (loan approval) still has steps that need only deterministic validation. Routing by step requirements assigns capability proportional to need. Routing by importance over-serves easy steps and under-serves nothing.

Common wrong answer to avoid: "Because cheaper is always better." Cheaper is better only when quality is preserved. The rule is cheapest-that-clears-the-bar, not cheapest-regardless.

Q: When would you choose a deterministic tool over any LLM for a step?

A: When the step requires exact reproducibility (schema validation, API calls, field extraction from known formats), when the output must be auditable (compliance checks), or when the step has no reasoning component (HTTP calls, database queries, template fills). An LLM adds non-determinism and cost for no benefit.

Common wrong answer to avoid: "When cost matters." Cost is one reason, but the stronger reason is that non-determinism in a deterministic task is a liability, not just an expense.

Q: How do you know if your routing policy is stale?

A: Monitor per-step success rates and per-step costs over time. If a step's failure rate increases without workflow changes, the executor may no longer match the step's evolved requirements. If a step's cost is consistently high relative to its reasoning demand (measured by output quality at lower tiers), the routing is over-serving.

Common wrong answer to avoid: "Check it quarterly." Staleness is continuous and workload-dependent — monitor the signals, don't rely on calendar reviews.

Q: Why is fallback routing better than retrying on the same executor?

A: Because if the primary executor failed due to a capability mismatch (input too complex, context too long, reasoning too difficult), retrying the same executor repeats the same failure. A fallback changes the capability profile — stronger model, different tool set, human escalation — giving the step a genuine chance to succeed.

Common wrong answer to avoid: "Because retries are slow." Speed is secondary. The primary reason is that diversity of capability on retry addresses root causes that repetition cannot.


Design/debug exercise (10 min)

Modeled example. The loan-approval workflow has 8 sub-steps (including 3a–3d). For each, the routing decision was: executor type, reasoning depth needed, and why that executor is the cheapest viable option. The frontier model is used exactly once — for multi-factor risk reasoning.

Your turn. Take a workflow with 5+ steps. For each step, fill in: reasoning depth (low/medium/high), context size, determinism requirement, risk class. Then assign the cheapest executor that satisfies all four. Calculate total cost vs "route everything to frontier."

From memory. Draw the four routing dimensions. Sketch the rule-based routing decision tree. Label which loan-approval step triggers state-aware routing and why.


Operational memory

This chapter established that routing is an economic optimisation governed by one invariant: each step routes to the cheapest executor that clears its quality bar. The four routing dimensions (reasoning depth, context requirement, determinism, risk class) determine the correct executor. Static routing works for simple systems; rule-based routing is the production default; learned routing pays off only at scale; fallback routing provides resilience.

The onboarding example proved that most workflows over-route: only 1 of 7 steps needed the frontier model. The loan-approval example showed state-aware routing: a borderline risk score escalates a downstream step to a stronger executor dynamically. Routing failures (capability mismatch, cost mismatch, fallback storms) are operational problems that surface as cost spikes or quality drops without obvious errors.

Remember:

  • Route by step requirement, not by workflow importance or model loyalty.
  • The cheapest executor that clears the quality bar is always the correct choice.
  • Most workflows over-route — audit step by step before assuming frontier models are needed everywhere.
  • State-aware routing uses typed handoff outputs from prior steps to adjust dispatch decisions dynamically.
  • Deterministic tools beat LLMs for steps with no reasoning component — cheaper, faster, reproducible.
  • Fallback routing changes capability on failure; retries repeat the same weakness.
  • Monitor per-step success rate and cost. Staleness in routing policy shows as gradual quality decline or cost drift.

Bridge. Each step now has an executor. But how do steps connect? Sequential? Parallel? Conditional? The shape of the workflow graph — its edges and branching logic — determines latency, failure propagation, and recovery complexity. Next: workflow patterns. → 04-workflow-patterns.md