Skip to content

01. The stack the bot has to operate — why a fluent voice agent still strands every caller

~16 min read. The bot's voice is perfect. It still gets escalated within three weeks. The reason has nothing to do with how it talks.

Built on 00-first-principles.md. This chapter draws the system the AI is dropped into — and exposes the integration seam and the warm vs cold transfer failure that the whole module exists to fix. Every later chapter plugs into a box drawn here.


What the demo proved and what it quietly skipped

The voice-agents module taught you to make a model hear, think, and speak inside a sub-second budget. That skill is real and necessary. But it answers exactly one question: can the agent hold a fluent spoken turn? It says nothing about whether the agent can find out who is calling, route them to the right place, hand them to a human without losing the thread, or leave a record that the business can act on.

A contact center is the machinery that answers those other questions. It existed long before AI and runs the business today: it greets the caller, identifies why they called, decides where they should go, holds them while resources free up, connects them to whoever can close the issue, and writes down what happened. The AI is a new occupant in a house that already has plumbing. This chapter is the floor plan. Without it, you will wire the bot to the faucet and wonder why the bathtub never fills.

By the end you will be able to point at any box in the stack — IVR, ACD, queue, agent desktop, CRM, recorder — say what it does, and name where the AI inserts and what it must respect at that seam.


What this file solves

A voice bot can pass its demo and still fail in production because passing the demo only proves it can talk. This file shows you the rest of the call path — greeting, routing, queueing, agent handling, disposition — so you can see the three jobs a real bot must also do (authenticate, transfer with context, log to CRM) and recognize, the moment someone shows you a "voice AI" that does none of them, that you are looking at a fluent mouth with no hands.


The call path the bot lives inside

Trace one inbound call to the billing line before any AI exists. The caller dials a number. That number terminates on a carrier/SIP trunk that hands the call to the platform. An IVR (interactive voice response) greets them and collects intent — "press 1 for billing, 2 for technical support" or, in newer systems, "tell me why you're calling." Based on that intent and business rules, the ACD (automatic call distributor) picks a queue. The caller waits in queue, listening to hold music, until a qualified agent is available. The ACD connects them. The agent has an agent desktop — a softphone plus a CRM screen — and during or after the call writes a disposition (an outcome code) and notes back into the CRM. A call recorder captures audio for quality and compliance. Workforce management software forecasts how many agents to staff so the queue does not explode.

                          THE CONTACT CENTER STACK
                          (one inbound billing call)

  Caller
    │  dials
┌─────────────┐   SIP/RTP   ┌──────────────────────────────────────────┐
│ Carrier /   │────────────▶│  CCaaS PLATFORM                            │
│ SIP trunk   │             │                                            │
└─────────────┘             │   ┌──────┐   intent    ┌──────┐  pick      │
                            │   │ IVR  │────────────▶│ ACD  │──queue──┐  │
                            │   └──────┘             └──────┘         │  │
                            │      ▲                                  ▼  │
                            │      │ greet/collect              ┌─────────┐
                            │      │                            │  QUEUE  │
                            │   ┌──┴───────────────────┐        │ (hold)  │
                            │   │  ★ where AI inserts ★ │        └────┬────┘
                            │   └──────────────────────┘             │
                            │                                        ▼
                            │   ┌───────────────┐  connect   ┌─────────────┐
                            │   │ AGENT DESKTOP │◀───────────│   AGENT     │
                            │   │ softphone+CRM │            │  (human)    │
                            │   └───────┬───────┘            └─────────────┘
                            │           │ disposition + notes              │
                            └───────────┼──────────────────────────────────┘
                                        ▼                       ┌──────────┐
                                  ┌──────────┐                  │ RECORDER │
                                  │   CRM    │◀── log ───────── │ (audio)  │
                                  └──────────┘                  └──────────┘

Three facts about this picture decide everything that follows. First, every box is a separate system with its own API, latency, and failure mode — the integration seam between them is where work and risk concentrate. Second, the caller's identity and intent are established early (IVR) but needed late (agent, CRM, payment) — so context must travel the whole path or get re-collected. Third, nothing is real to the business until the disposition is written; an un-logged call is a call that, financially and legally, did not happen.

Teacher voice. Notice the AI does not replace this diagram. It slots into the box marked "where AI inserts" — usually between IVR and ACD (a virtual agent that handles the call) or beside the agent desktop (agent assist). The stack stays. The bot is a tenant, not the landlord. The whole reason contact-center AI is hard is that a tenant must respect the building's plumbing.


Where exactly the AI can insert

There are only three insertion points, and they answer different problems.

Self-service virtual agent — the AI sits at the IVR/front-door position and handles the entire call autonomously: greet, authenticate, answer, transact, transfer if needed. This is our running example's role. Highest value, highest blast radius, because the AI is alone with the caller and the systems.

Agent assist — the AI sits beside a human agent, listening to the live call, surfacing knowledge and next-best-action, drafting the after-call summary. The human stays in control; the AI never speaks to the caller. Lower blast radius, faster to deploy, because a human catches its mistakes. This is chapter 05.

Post-call analytics — the AI never touches the live call. It processes recordings and transcripts in bulk afterward for QA, sentiment, and compliance. Zero latency pressure, pure scale and cost. This is chapter 06.

flowchart LR
    C[Caller] --> IVR[IVR / front door]
    IVR -->|self-service| VA[★ Virtual Agent<br/>AI handles the call]
    VA -->|complex| ACD[ACD routing]
    IVR -->|straight to human| ACD
    ACD --> AG[Human agent]
    AA[★ Agent Assist<br/>AI listens, suggests] -.live.-> AG
    AG --> REC[(Recordings)]
    VA --> REC
    REC --> PCA[★ Post-call Analytics<br/>AI scores in bulk]
    AG --> CRM[(CRM)]
    VA --> CRM

Most real deployments run all three. The virtual agent deflects simple calls, agent assist speeds the humans who get the hard ones, and analytics grades everything afterward. They share components — the same ASR, the same CRM, the same recordings — which is why this module teaches them in one arc.


When fluent is not the same as functional

Here is the minimal scenario that exposes the trap. Two bots take the same billing call.

Caller: "I think I was double-charged this month, and I want to pay
         what I actually owe with my card."

BOT A (fluent, no hands):
  "I'm sorry to hear about the double charge! I'd be happy to help
   you with your payment today."   ← warm, instant, useless
  ...cannot see the account, cannot take the card, transfers cold.

BOT B (functional):
  [authenticated caller via DOB + account PIN against CRM]
  "I see your account ending 4417. There's a duplicate $59 charge on
   the 3rd — I've flagged it for credit. Your remaining balance is
   $59. I can take that payment now. I'll connect you to a secure
   line to enter your card."   ← does the actual jobs

Bot A scores higher on a voice-quality eval. Bot B is the one the business needs. The difference is not intelligence or prosody. It is whether the bot is wired to the three boxes that make a billing call real: identity (CRM auth), payment (a PCI path), and continuity (warm transfer + disposition). A bot that talks well into a stack it cannot operate is the single most common contact-center-AI failure.

So the real problem is not "the model isn't smart enough" and not "the voice isn't natural enough." It is that the bot has no hands — no authenticated grip on the CRM, no compliant grip on the payment gateway, no grip on the transfer that carries context. How do we even talk about a bot's hands precisely? By naming the jobs the stack demands at each box.


Rule: a contact-center bot is judged by what it can finish, not how it sounds

The load-bearing truth of this whole module: a voice agent's value is bounded by the systems it can actually operate, not by the quality of its speech. Fluency is table stakes. Finishing the call — authenticating, transacting, transferring with context, logging the outcome — is the product. Every later chapter is one of those hands.

Why this rule exists. The primitive is that a phone call is a transaction against business systems, not a conversation. The constraint is that those systems (CRM, payment, ACD, recorder) are real, deterministic, and unforgiving. A fluent answer that does not move state in those systems leaves the caller exactly where they started — except now annoyed, because the bot sounded like it could help.


1) The CCaaS landscape — who owns the boxes

You rarely build this stack from scratch. You rent it from a CCaaS (contact center as a service) vendor, who owns the telephony, IVR, ACD, queues, recording, and agent desktop, and exposes hooks where your AI plugs in. Knowing the landscape matters because the seams the vendor gives you determine what your AI can do.

As of 2026, the market (~$7–9B, growing ~18–21% a year) is led by a handful of platforms. NICE (CXone Mpower) holds the largest revenue share (~22%) and the most mature workforce-engagement suite. Genesys Cloud (~20%) embeds AI across the whole interaction lifecycle. Amazon Connect (~14–15% mindshare) is the AWS-native, pay-per-use option with deep Contact Lens analytics. Five9 (~13%) leads on outbound dialing and its Genius AI agent-assist stack. Twilio (Flex + Programmable Voice) is the developer-first, build-it-yourself option. Cisco Webex Contact Center rounds out the enterprise field.

Platform AI hook you'll actually use Strength for our billing bot
Amazon Connect Kinesis Video Streams (customer audio), Lambda flow blocks, Contact Lens tight AWS + Salesforce Service Cloud Voice integration
Genesys Cloud Audiohook/AudioConnect, Architect flows, open messaging lifecycle AI orchestration, strong routing
Twilio Flex/Voice Media Streams (fork raw audio over WebSocket/SIPREC) most control; you build the bot yourself
NICE CXone Enlighten AI, real-time interaction guidance best-in-class WEM + QA at scale
Five9 Genius AI, IVA, Agent Assist outbound + TCPA/DNC compliance

Mini-FAQ. "Do I pick the platform for its built-in AI or for its hooks?" For a serious custom voice agent, pick for the hooks — specifically, can it fork you live audio (Twilio Media Streams, Connect KVS, Genesys AudioConnect) and let you take over and release the call cleanly? Built-in AI is convenient but rigid; the hook is what lets you bring your own ASR/LLM/TTS stack and still transfer back into the platform's queues.

For the billing bot, the choice cascades: if you want full control of the latency budget (chapter 04) and your own ASR (chapter 03), you lean Twilio Flex or raw Connect with Media Streams / KVS. If you want the platform to do more and you to do less, you lean Genesys or Five9 turnkey agents and accept their seams.


2) Picture the seams as a context relay

The mental model that prevents the cold-transfer disaster: think of the call as a relay race carrying a context baton. The baton holds who the caller is, why they called, what's been verified, and what's been done. Every box must hand the baton forward.

   IVR              VIRTUAL AGENT          ACD/QUEUE           HUMAN AGENT
 collects   ──────▶  authenticates  ──────▶  routes   ──────▶  closes
 "billing"          "verified, acct          "to billing       "reads baton,
                     4417, dup charge          tier-2"           no re-ask"
                     flagged, balance $59"
        the BATON  ───────────────────────────────────────────▶
        (intent, identity, auth state, work done, transcript)

   DROP the baton anywhere  ⇒  caller re-explains  ⇒  cold transfer  ⇒  1★

The warm vs cold transfer distinction is exactly whether the baton survives the last handoff. A cold transfer drops it: the human answers blind. A warm transfer passes it: account already on screen, transcript attached, intent known. Chapter 07 is entirely about building this baton in CRM/CTI terms. Keep the relay picture; everything else hangs on it.


3) The running example, threaded: one billing call, end to end

Here is the call we carry through all nine chapters, mapped onto the stack so you can see which chapter owns which box.

1. Caller dials billing line          → carrier/SIP trunk        (ch 02)
2. AI greets, hears "double-charged"  → media fork + ASR         (ch 02, 03)
3. AI authenticates (DOB + PIN)       → CRM lookup mid-call       (ch 07)
4. AI explains dup charge, balance    → orchestration + LLM      (ch 04)
5. AI takes card payment              → PCI DTMF path            (ch 08)
6. Complex dispute → warm transfer    → baton to human + ACD     (ch 01, 07)
7. AI/agent writes summary to CRM     → disposition              (ch 06, 07)
8. Recording stored, card redacted    → retention + redaction    (ch 08)
9. Nightly QA + sentiment scoring     → post-call analytics      (ch 06)

Every step is a box on the floor plan and a hand the bot needs. The voice quality lives in step 2. The product lives in steps 3–8. This is why "the bot sounds great" and "the bot works" are different claims.


4) Why insert at the IVR, not replace the ACD — choosing the seam under this workload

The tempting alternative is to make the AI the whole platform — let it route, queue, and manage agents too. Under a billing-line workload that span of control is wrong. The ACD already encodes years of routing rules, skill-based assignment, SLA targets, and overflow logic that the business depends on and auditors check. Re-implementing it inside an LLM means re-implementing determinism the business already trusts, and getting it probabilistically wrong.

The right seam for a self-service agent is at the front door: the AI handles the conversation, and when it must transfer, it hands back to the existing ACD with a target queue and an attached baton. The AI decides whether and where to escalate; the ACD decides how to route within the building's rules.

  • AI replaces the platform — total control, but you rebuild routing, recording, WFM, and compliance from scratch and own every audit. Almost never correct.
  • AI inserts at the front door, defers to ACD — the AI owns the conversation and the escalation decision; the platform owns deterministic routing, recording, and staffing. Correct for nearly all deployments.

For the billing bot, this means the AI never picks the human agent. It decides "this dispute needs a tier-2 human," packs the baton, and hands transfer(queue="billing-tier2", context=baton) back to the ACD.


5) The property that changes the design: deflection rate vs containment quality

The dimension that reshapes everything is the difference between deflection (calls the AI keeps off human queues) and containment quality (whether those deflected calls actually got resolved). A bot can show 60% deflection by simply not transferring — including the calls it failed. Those callers hang up, call back angrier, and the rework lands on humans anyway, now with worse sentiment.

   Deflection rate alone:   60% "handled" by AI   ← looks great
   But containment quality:  of those, 25% were just abandoned by frustrated callers
   True resolution:         60% × 75% = 45% actually resolved
   Hidden cost:             15% of all calls became repeat calls + bad CSAT

Optimize deflection alone and you reward the bot for stranding people. The honest metric is resolved-without-escalation-and-without-callback. This tension recurs in chapter 04 (when to fall back to a human) and chapter 06 (how analytics catches false containment).


6) One failure walked through: the cold transfer that re-asks everything

Production incident, week three of the billing bot. The AI authenticates the caller, flags a duplicate charge, then hits a genuine billing dispute it cannot resolve. It says "let me transfer you to a specialist" and calls the platform's transfer API — transferToQueue("billing-tier2"). The call lands on a human. The human's screen shows: an inbound call from a phone number. Nothing else.

The human says "Hi, this is billing, how can I help?" The caller — who just spent two minutes authenticating and explaining — explodes. They re-authenticate (the human now has to verify identity again, because the AI's verification did not travel). They re-explain the dispute. Average handle time for that call doubles. CSAT is 1 star. And the post-call survey blames the human agent, who did nothing wrong.

The root cause is not the transfer API. It is that the transfer carried no baton: the verified identity, the flagged charge, and the transcript stayed inside the AI's session and never attached to the call as it moved. The fix (chapter 07) is to write the context to the CRM contact record and attach a contact-attributes payload to the transfer so the ACD pops it on the human's screen. Same API call, plus a baton. That single change is the difference between a one-star and a four-star transfer.


7) Cost movement: build-your-own seams vs turnkey platform AI

Two ways to give the billing bot its hands, with rough first-year numbers for a 40k-calls/day center (illustrative; real pricing varies by contract).

Turnkey platform AI (e.g., Genesys/Five9 IVA) Build-your-own on Twilio/Connect hooks
Time to first call weeks months
Per-minute media + AI cost bundled, ~$0.08–0.15/min unbundled: telephony ~\(0.01 + ASR ~\)0.01–0.02 + LLM ~\(0.01–0.05 + TTS ~\)0.01–0.03
Latency control vendor-set, limited full control of the turn budget
Integration depth what the vendor exposes anything you can code
Who owns compliance shared with vendor you
What it fixes speed to launch control, cost ceiling at scale
What it costs you flexibility, latency tuning, margin engineering time, on-call, audit ownership

The pressure this trades: turnkey relieves engineering effort but creates latency and flexibility pressure absorbed by the product experience. Build-your-own relieves latency and cost-at-scale pressure but creates engineering and compliance-ownership pressure absorbed by your team. For a high-volume billing line where every 200 ms of turn budget and every cent per minute matters at 40k calls/day, build-your-own usually wins past a threshold — typically a few thousand calls/day.


8) Signals that the bot is stranding callers, not serving them

Healthy: high resolved-without-callback rate, low repeat-call rate within 48 hours, warm-transfer rate high relative to cold, CSAT stable across AI and human segments.

First metric to degrade: 48-hour repeat-call rate. When the bot starts failing silently — answering fluently but not resolving — callers hang up and call back. Repeat rate climbs before deflection or CSAT moves, because deflection counts the failed calls as wins and CSAT surveys undersample the angriest hang-ups.

Misleading metric people watch: raw deflection rate. It goes up when the bot strands people. A rising deflection rate with a rising repeat-call rate is the signature of false containment, not success.

First graph an expert opens: deflection rate and 48-hour repeat-call rate on the same time axis. If they rise together, the bot is shedding load onto the future and onto humans, not removing it. The second graph: CSAT split by AI-only vs AI-then-transferred — a gap reveals cold-transfer pain.


9) Boundary: where front-door AI fits, where it turns pathological

Front-door virtual agents fit unusually well on high-volume, low-variance, transactional lines: balance checks, payment due dates, appointment confirmations, password resets — exactly the bulk of a billing line. The work is structured, the systems are queryable, and a clean transfer covers the tail.

It becomes pathological on high-emotion or high-variance calls: a customer in a billing crisis who is also reporting a death in the family, a fraud victim, a regulated complaint that must be logged a specific way. Here a fluent bot reads as cold and obstructive, and the deflection metric tempts you to keep it in the path. The scale limit that invalidates intuition: a bot that works great at 1,000 calls/day can become a containment disaster at 100,000/day because the absolute number of stranded edge-case callers — even at a small percentage — becomes a flood of repeat calls and complaints that swamps the human queue you were trying to relieve.


10) Wrong assumption: "if the conversation is good, the call is handled"

The seductive idea: a great transcript means a great outcome. It does not. The transcript measures the conversation; the outcome is whether state moved in the CRM, the payment cleared, and the right disposition was written. A call can have a flawless transcript and zero business effect — the caller felt heard and accomplished nothing.

Replace it with: the conversation is the interface, not the result. Grade the bot on systems touched and outcomes recorded, not on how the dialogue reads. This single correction reorders your whole eval suite, and it is why chapter 06 scores resolution and compliance, not just sentiment.


11) Other ways the stack bites the bot

  • DTMF deafness — the bot ignores keypad presses because it only listened for speech; callers mashing "0 for agent" get stuck.
  • Hold-music collision — the bot transfers into a queue and keeps "listening," then tries to transcribe hold music and loops.
  • Caller-ID assumption — the bot trusts ANI (caller ID) as identity; spoofed numbers walk straight past auth (chapter 07/08).
  • No barge-in on the IVR side — the bot talks over a caller who already knows what they want, inflating handle time.
  • Orphaned sessions — the carrier drops the call mid-flow; the bot's session leaks, holding a CRM lock or a half-written disposition.
  • Queue overflow blind spot — the bot transfers to a queue with zero staffed agents at 2 a.m.; the caller waits forever.
  • Disposition skipped on transfer — the AI transfers and never writes why, so analytics can't tell resolved from abandoned.
  • WFM mismatch — deflection shifts call volume and skill mix, but staffing forecasts still assume the old pattern; queues thrash.

12) Pattern transfer

  • Front door defers to deterministic core — same shape as an API gateway that handles auth and shaping but defers business logic to backend services. The AI is the gateway; the ACD is the deterministic core. The shared pressure: keep probabilistic components out of the deterministic decision path.
  • The context baton — same failure geometry as a distributed trace losing its span context across a service boundary: the work happens, but the why and who don't propagate, so the next hop starts blind. Warm transfer is span propagation for phone calls.
  • False containment — structurally identical to a cache "hit rate" that counts stale or wrong responses as hits: the headline metric rewards the failure. Watch the second-order metric (repeat calls / cache-staleness), not the headline.

13) Design test

  1. Can your bot, on a hard call, transfer with the verified identity and transcript attached — or does the human answer blind?
  2. Does any AI insertion re-implement routing, recording, or staffing the platform already owns?
  3. Do you measure resolved-without-callback, or only deflection?
  4. When the carrier drops a call mid-flow, does the session and any half-written disposition clean up?
  5. Can you name the platform hook that forks you live audio and lets you transfer back into a real queue?

If any answer is "no" or "not sure," the bot is a fluent mouth with missing hands.


Where this appears in production

  • Amazon Connect — cloud ACD + IVR + recording; Lambda flow blocks are the seam where a custom AI takes over and releases the call.
  • Genesys Cloud Architect — visual IVR/routing flows; AudioConnect forks live audio to a bot mid-flow.
  • Twilio Flex — programmable agent desktop; Media Streams forks raw call audio to your AI over WebSocket.
  • NICE CXone — enterprise ACD + the most mature workforce-engagement suite; Enlighten supplies real-time guidance.
  • Five9 IVA — turnkey intelligent virtual agent at the front door, with outbound TCPA/DNC compliance built in.
  • Cisco Webex Contact Center — enterprise routing and queueing with AI agent and assist add-ons.
  • Salesforce Service Cloud Voice — fuses the agent desktop with the CRM so the baton lands on one screen.
  • Zendesk Talk — CTI inside the Zendesk ticketing desktop; screen pop on the ticket.
  • Twilio TaskRouter — the routing/queue engine you defer to after the AI decides to escalate.
  • AWS Contact Lens — scores deflection and containment quality so false containment surfaces.
  • Verint / NICE WFM — workforce-management forecasting that must be re-tuned once the AI shifts call mix.
  • Talkdesk Autopilot — virtual-agent front door that hands complex calls to human queues.
  • LivePerson — conversational front door across voice and messaging into the same routing core.
  • Vapi / Bland — developer platforms that wrap telephony + ASR + LLM + TTS so you build the front-door agent fast.
  • PagerDuty/Opsgenie call routing — the same IVR→route→escalate shape applied to on-call incident calls.

Recall

  1. Name the six boxes an inbound call passes through before it's "handled."
  2. What are the three places AI can insert, and how does blast radius differ across them?
  3. What is the context baton, and which transfer type drops it?
  4. Why does optimizing deflection rate alone reward the bot for stranding callers?
  5. Why should the AI defer routing to the ACD instead of routing itself?
  6. Which metric degrades first when a bot fails silently, and why before deflection or CSAT?
  7. For a 40k-calls/day billing line, what pushes the build-vs-buy decision toward build-your-own?

Interview Q&A

Q1. A stakeholder says "our new voice bot has 65% deflection, it's a huge success." What do you ask next? Ask for the 48-hour repeat-call rate and CSAT split by AI-only vs AI-then-transferred. Deflection counts failed calls that callers abandoned as wins. If repeat calls rose alongside deflection, the bot is shedding load onto the future and onto humans — false containment, not resolution. The honest metric is resolved-without-escalation-and-without-callback. Common wrong answer to avoid: congratulating them and asking how to push deflection to 80% — that optimizes the metric that rewards stranding callers.

Q2. Why not let the AI replace the ACD and do routing, queueing, and staffing too? Because the ACD encodes deterministic, audited routing rules, skill-based assignment, SLA logic, and recording the business already trusts. Re-implementing that inside an LLM means making deterministic guarantees probabilistic. The correct seam is the front door: the AI owns the conversation and the escalation decision; the ACD owns deterministic routing. Common wrong answer to avoid: "the AI is smarter so it should handle everything" — span of control isn't about intelligence, it's about which guarantees must stay deterministic and auditable.

Q3. The human agents complain every transferred call "starts from zero." Where is the bug? The transfer carries no context baton. Verified identity and the transcript live in the AI's session and never attach to the call. Fix: write context to the CRM contact record and attach contact attributes to the transfer so the ACD screen-pops it. It's a cold-transfer / context-propagation failure, not a model failure. Common wrong answer to avoid: "retrain the bot to summarize better" — the summary may be fine; the problem is it never traveled to the human.

Q4. Build-your-own on Twilio Media Streams vs a turnkey Five9 IVA — how do you decide? On volume and control needs. Turnkey wins on time-to-launch and shared compliance; build-your-own wins on latency control (you own the full turn budget), unbundled per-minute cost at scale, and integration depth. Past a few thousand calls/day on a latency-sensitive transactional line, build-your-own's cost and control usually justify the engineering and audit ownership. Common wrong answer to avoid: "always build for control" or "always buy for speed" — it's a volume-and-latency threshold, not a dogma.

Q5. Where would you insert AI first in a center that's never used it — virtual agent or agent assist? Usually agent assist first. A human stays in control and catches AI mistakes, so blast radius is low, it deploys faster, and it builds organizational trust and a labeled dataset of good interactions. Then graduate to a front-door virtual agent on the high-volume, low-variance call types. Common wrong answer to avoid: "virtual agent, it deflects the most calls" — highest deflection is also highest blast radius and the riskiest place to start with an unproven stack.

Q6. The bot works at 1,000 calls/day but causes chaos at 100,000/day. What changed? The absolute volume of stranded edge cases. A 3% mishandle rate is 30 stranded callers/day at small scale and 3,000/day at large scale — a flood of repeat calls and complaints that swamps the human queue. Containment quality, not deflection rate, is the metric that exposes this, and it's why edge-case routing matters more as you scale. Common wrong answer to avoid: "the model degraded under load" — the per-call behavior is unchanged; it's the absolute count of the unchanged failure rate that crossed a threshold.

Q7. (Cumulative, looks ahead) A transferred call also lost the caller's PCI-safe payment state. Is that a chapter 1 transfer problem or a chapter 8 compliance problem? Both, and that's the point. The transfer (ch 01/07) must carry payment status (paid / amount / auth token) but must not carry the card number (ch 08) — the baton has to be designed to propagate context while keeping cardholder data out of scope. It's a composition of the warm-transfer mechanism and PCI scope, which is why the module teaches them as connected, not separate. Common wrong answer to avoid: "just put the card number in the transfer payload so the human can finish the payment" — that drags the human, the screen pop, and the recording into PCI scope.


Design/debug exercise (10 min)

Step 1 — Modeled example. Map the duplicate-charge billing call onto the stack with the owning chapter for each box (done above in section 3). For each box, write the one thing that breaks if the baton doesn't reach it. Example: "Human agent — breaks: re-authentication, caller re-explains, AHT doubles."

Step 2 — Your turn. Take a different call type for the same billing line — "I want to cancel my account." Walk it through the six boxes. Where does the AI authenticate? Where must it not deflect (a cancellation is a retention/compliance moment)? What goes in the baton if it transfers? Write the disposition you'd expect.

Step 3 — Reproduce from memory. Redraw the contact-center stack diagram (section 0) and the context-baton relay (section 2) cold. Then connect them: mark on the stack exactly where the baton gets dropped in a cold transfer, and where chapter 07 will reattach it.


Operational memory

This chapter explained why a voice bot can ace its demo and still fail in production within weeks. The important idea is that a contact-center bot is judged by the systems it can operate and the calls it can finish — authenticate, transact, transfer-with-context, log — not by how fluently it speaks. Fluency is the interface; finishing the call is the product.

You learned to read the stack as a fixed floor plan (carrier → IVR → ACD → queue → agent → CRM/recorder) that the AI inserts into at one of three points, and to picture the call as a relay carrying a context baton that a cold transfer drops. That solves the opening failure because the stranded-caller disaster was never about voice quality — it was a bot with no hands and a dropped baton.

Carry this diagnostic forward: when a "successful" bot generates complaints, put deflection and 48-hour repeat-call rate on the same axis before celebrating. If they rise together, you have false containment. Inspect the transfer path for a missing baton before blaming the model.

Remember:

  • A bot's value is bounded by the systems it can operate, not the quality of its speech.
  • The AI is a tenant in the stack; defer routing/recording/staffing to the platform, own the conversation and the escalation decision.
  • Warm vs cold transfer is whether the context baton (identity, intent, transcript, work done) survives the handoff.
  • Deflection rate alone rewards stranding callers; measure resolved-without-callback and watch repeat-call rate.
  • Three insertion points — virtual agent (high blast radius), agent assist (low, start here), post-call analytics (zero latency).

Bridge. We have the floor plan and we know the AI inserts at the front door. But to insert at all, the bot has to actually hear the call and speak into it — which means getting raw audio off a phone network that was built for humans, not models, and putting audio back without stepping on the caller. That media plumbing, with its own latency and jitter, is the first integration seam. → 02-telephony-and-audio-integration.md