05. Putting the AI beside the human instead of alone with the caller¶
~16 min read. The hardest calls don't get deflected — they get escalated to a human. Now the AI's job flips: it never speaks, never drives, never takes a turn. It listens to a live call it cannot control and tries to make the human faster and more correct without getting in the way.
Built on 04-bot-orchestration-and-latency-budget.md. This is the lower-blast-radius insertion point named in chapter 01 — the AI sits beside the human, not in front of the caller. It reuses chapter 03's live ASR and chapter 02's media fork, but the turn budget pressure is gone: the human owns the conversation. A new pressure replaces it — operator attention.
Note: agent assist shares the streaming ASR (chapter 03) and media fork (chapter 02) of the virtual agent. This chapter focuses on the seam unique to assist: surfacing the right thing at the right moment to a human who is mid-sentence with a real customer, and the new failure mode — distracting the agent — that has no analogue in the autonomous bot.
Why the safest AI in the contact center never speaks to the caller¶
Chapter 04 built an autonomous bot that handles the call alone. That's the highest-value, highest-blast-radius insertion — when it's wrong, it's wrong at the caller, with money and compliance on the line. Chapter 01 named a quieter alternative: the AI sits beside a human agent, listens to the live call, and surfaces help. The human stays in control. The AI never opens its mouth on the line.
This flips every pressure from the last three chapters. There is no turn budget — the human takes the turns. There is no barge-in to manage, no end-of-turn to detect for speaking. The AI is a passive listener with a side channel to one person: the agent's screen. What replaces the turn budget is operator attention — the agent has finite cognitive room while talking to an upset customer, and anything the AI shows competes for it. Surface the wrong thing, or the right thing at the wrong moment, and you've made the agent slower.
By the end you can name what agent assist surfaces (live transcript, next-best-action, knowledge articles, auto-filled fields), why timing and ranking matter more than raw capability, and why the metric that kills bad assist is not accuracy but distraction.
What this file solves¶
A live human agent is already overloaded — talking, typing, navigating five systems, reading the customer's mood. An AI that dumps every possibly-relevant article and suggestion onto the screen makes this worse, not better. This file shows how agent assist listens to the live call, surfaces the one next-best-action or article at the right moment, and auto-fills the wrap-up — so the human is faster and more consistent without being buried, and how to measure whether the assist is helping or distracting.
Why "show everything relevant" buries the agent¶
The obvious build: stream the live transcript, run it through retrieval and an LLM, and show the agent every relevant knowledge article, every possible next action, the customer's full history, sentiment, and suggested phrasing — all updating in real time. A smart team builds this first because every piece is genuinely useful somewhere.
It fails the moment a real agent uses it on a hard call. The agent is talking to a customer disputing a charge. The assist panel is a waterfall: six articles, three suggested actions, a sentiment gauge swinging, a live transcript scrolling, all re-ranking every few seconds. The agent's eyes flick to the panel, lose the customer's last sentence, and ask them to repeat. The "help" added cognitive load and lengthened the call. Agents quietly stop looking at the panel, and the assist's adoption — the only metric that matters — collapses.
So the real problem is not "the AI isn't surfacing enough." It is that operator attention is the scarce resource, not information — the agent can act on at most one or two things per moment, and a panel that shows ten competes with the customer for the agent's mind. How can the AI surface the single most useful thing at the moment it's actually needed, and stay silent otherwise?
That question defines good agent assist: ruthless ranking and timing. One next-best-action, surfaced when the transcript shows the trigger, dismissed when it's no longer relevant. Salesforce Einstein, NICE Enlighten, and similar systems do exactly this — analyze the live transcript and surface a single recommended action or article at the inflection point, not a wall of options.
Rule: surface one thing, at the right moment, dismissible — never a wall¶
The load-bearing rule of agent assist: the AI's output competes with the customer for the agent's attention, so it must surface the single highest-value item at the moment of need and get out of the way — ranking and timing beat completeness. An assist that's right but late, or right but buried in nine others, is a distraction.
Why this rule exists. The primitive is that a live agent is a single-threaded processor already at capacity — speaking, listening, typing, navigating. The constraint is that anything on the assist panel draws from the same attention pool that the customer needs. Showing N items doesn't give the agent N× help; past one or two, it gives negative help by stealing focus. The rule forces the AI to spend the agent's attention as carefully as chapter 04 spent the turn budget — both are scarce pots, drawn from by every addition.
1) What agent assist actually surfaces — and when¶
Four things, each with a moment that justifies it. The discipline is the when, not the what.
LIVE CALL (human agent + disputing customer)
│ transcript streams (ch 03 ASR)
▼
┌──────────────── ASSIST ENGINE (listens, never speaks) ────────────┐
│ │
│ 1. LIVE TRANSCRIPT → always on, so the agent need not │
│ take notes (frees attention) │
│ 2. KNOWLEDGE SURFACING → when transcript hits a question │
│ the KB answers ("refund policy?") │
│ 3. NEXT-BEST-ACTION → when state implies one step │
│ ("offer the loyalty credit") │
│ 4. AUTO-FILL / SUMMARY → during + after the call, drafts the │
│ disposition the agent confirms │
└────────────────────────────────────────────────────────────────────┘
│ agent confirms / edits (human stays in control)
▼
CRM disposition + summary (ch 06, ch 07)
The live transcript is "always on" because it removes load — the agent stops note-taking and listens better. The other three are event-triggered: knowledge surfaces when the customer asks something the KB answers; next-best-action surfaces when the call state implies a specific step; auto-fill drafts the wrap-up so the agent confirms instead of types. Salesforce's Service Cloud Voice auto-summarizes the call after disposition and adds it to the contact record — the agent edits, doesn't author.
For the billing dispute, this means: the transcript scrolls quietly, and at the moment the customer says "this is the third time I've called about this," the assist surfaces one thing — "Offer the goodwill credit; this customer has 2 prior unresolved contacts" — sourced from the CRM history. One item, exactly when it's actionable.
2) Picture: the assist as a second pair of eyes, not a second voice¶
The mental model that keeps assist from becoming noise: the AI is a quiet expert sitting behind the agent, watching the same call, who taps the agent's shoulder once when there's something worth saying — not a co-pilot grabbing the controls, and not a junior reading the whole manual aloud.
CUSTOMER ◀──────── talks ────────▶ AGENT (owns the call)
▲
│ taps shoulder ONCE
│ "offer the credit"
┌─────────┴──────────┐
│ ASSIST (the AI) │
│ watches silently │
│ ranks, waits, │
│ surfaces ONE thing│
└────────────────────┘
Autonomous bot (ch 04): AI IS the voice, owns the turn budget
Agent assist (ch 05): AI is the eyes, owns nothing, spends attention
Bad assist = a junior reading the whole manual aloud during the call
Contrast with chapter 04 cleanly: the autonomous bot owns the conversation and spends the turn budget; agent assist owns nothing and spends the agent's attention. Same shared ASR underneath, opposite control model. This is why agent assist is the safe place to start (chapter 01) — the human catches every AI mistake, so a wrong suggestion costs a glance, not a misrouted call or a leaked card.
3) The running example: assisting the human who got the escalated billing dispute¶
Thread the call forward. The chapter-04 bot hit a genuine dispute it couldn't resolve, packed the baton (chapter 07), and warm-transferred to a tier-2 human. Now the AI's role flips from driver to assistant. The human picks up with the account already on screen and the transcript attached.
Attempt A — the firehose panel¶
The assist shows the full prior transcript, six refund-policy articles, four suggested actions, a sentiment graph, and the customer's 18-month history, all live-updating. The agent, mid-apology to the customer, glances over to find the right policy, scrolls, loses the thread, and asks the customer to repeat the charge date. The customer — already on their third call — gets angrier. The "help" lengthened the call and worsened sentiment. Post-call, the agent reports the panel is "noise" and turns to ignoring it.
Attempt B — one tap at the right moment¶
The assist stays quiet, transcript scrolling. When the customer says "the duplicate charge on the third," the engine surfaces one card: the exact refund-policy clause for duplicate charges, with a one-click "apply credit" action pre-filled with $59 and the account. The agent glances, confirms, and says "I've credited the duplicate $59 back to you." When the call ends, the assist has already drafted the disposition ("Duplicate-charge dispute, $59 goodwill credit applied, resolved") which the agent confirms in two seconds.
The hard part hiding here: the engine had to know the moment. Surfacing the refund clause at call-open (before the customer explained) would be premature; surfacing it after the agent already handled it is too late. The trigger is the live transcript reaching the relevant state — which is exactly chapter 03's perception layer feeding a ranking-and-timing problem instead of a turn-taking one.
4) Why retrieval-and-rank instead of letting the LLM free-generate advice — choosing under an attention workload¶
The tempting alternative: let an LLM read the live transcript and freely generate advice and answers for the agent in natural language, like a chat copilot.
- LLM free-generates advice — flexible, conversational, handles any situation. But it can hallucinate a policy that doesn't exist, and a hallucinated answer the agent repeats to the customer is now a wrong commitment the company made. It also tends toward verbose paragraphs that cost attention.
- Retrieval-and-rank from the approved KB + CRM — surfaces actual approved articles and actual allowed actions, ranked, with provenance the agent can verify in a glance. Less flexible, but every surfaced item is grounded and short.
For a contact center where the agent may repeat the AI's output to the customer as a commitment, grounding wins decisively. The assist surfaces the approved refund clause (with a link to verify), not a freely-worded paraphrase that might invent a policy. The deciding question: could the agent act on or repeat this to the customer? If yes, it must be grounded in approved content, not generated. (The LLM still helps — to summarize the call and to rank what to surface — but the surfaced facts come from the KB.)
5) The property that changes the design: assist value is measured at the agent, not at the model¶
The dimension people get wrong is where you measure success. Model accuracy ("the surfaced article was relevant 90% of the time") is not the product metric. The product metric is whether the agent's outcomes improved — handle time, first-contact resolution, consistency, new-agent ramp time — and whether agents actually use it.
Model metric: "surfaced article relevant" = 90% ← feels great
But agent metric: panel adoption rate = 20% ← agents ignore it
AHT change with assist on = +8s ← it SLOWED them
A relevant-but-untimely panel can be 90% accurate and net-negative.
A 90%-accurate assist that agents ignore (because it's noisy or mistimed) has zero value. A 75%-accurate assist surfaced at exactly the right moment, that agents trust and use, has high value. This asymmetry should change your design priority: optimize timing, ranking, and trust before raw retrieval accuracy — and measure at the agent (adoption, AHT, FCR, consistency, ramp), not at the model. New-agent ramp is often the biggest win: a tenured agent knows the policy, but a week-one agent leans on the assist heavily, so assist value concentrates in the newest cohort.
6) One failure walked through: the assist that made tenured agents slower¶
Incident: agent assist rolls out. New agents love it — ramp time drops. But tenured agents' handle time rises ~8 seconds per call, and CSAT on their calls dips. The retrieval accuracy dashboard is green at 90%.
The chain: the panel surfaced articles and actions for every call, including the routine ones tenured agents could handle in their sleep. For those agents, the surfaced item wasn't new information — it was a thing to glance at, evaluate, and dismiss, every single call. That glance cost ~8 seconds and pulled attention off the customer. The assist was accurate; it was just unwanted for that cohort on those calls, and there was no suppression.
The root cause is not bad retrieval — the items were relevant. It's that the assist spent attention it didn't need to spend, treating a tenured agent on a routine call like a new agent on a hard one. The fix: suppress low-value surfacing (don't show the obvious to an expert), make the panel pull-not-push for routine calls (available if the agent looks, silent otherwise), and segment the rollout — push assist hard for new agents, light for tenured. This is the same "spend the scarce pot only when it pays" discipline as chapter 04's "not every turn deserves the LLM," now spending attention instead of latency.
7) Cost and value movement: what assist buys, what it costs¶
Effects of agent assist on a contact center (illustrative; varies by deployment and call mix):
| What it does | Who benefits most | What it buys | What it costs |
|---|---|---|---|
| Live transcript (always on) | all agents | no note-taking, better listening | screen real estate |
| Knowledge surfacing (triggered) | new agents | faster correct answers, consistency | attention if mistimed |
| Next-best-action (triggered) | new + mid agents | consistent offers, fewer missed steps | trust erodes if wrong |
| Auto-fill / summary | all agents | seconds saved per wrap-up at scale | review burden (must verify) |
| Sentiment alert to supervisor | supervisors | catch at-risk calls live | false-alarm fatigue |
The headline economics: assist doesn't cut headcount the way deflection does — it makes existing agents faster and more consistent and shrinks ramp time. The pressure evolution: assist relieves the knowledge-lookup and note-taking load (and the inconsistency between agents) but creates attention-management pressure — every surfaced item competes with the customer — absorbed by the ranking/timing logic and, if done badly, by the agent's focus. The auto-summary relieves wrap-up typing but creates a review burden: the agent must verify the draft, because an un-reviewed wrong summary becomes a wrong CRM record (chapter 07) and a wrong analytics input (chapter 06).
8) Signals that the assist is helping or hurting¶
Healthy: high panel adoption (agents actually use surfaced items), handle time flat-or-down with assist on, first-contact resolution up, new-agent ramp time down, low summary-edit rate (drafts are accurate).
First metric to degrade: panel adoption / interaction rate. When the assist is noisy or mistimed, agents stop clicking and start ignoring — adoption drops before AHT or CSAT visibly move, because agents route around the noise silently.
Misleading metric people watch: retrieval/model accuracy. A 90%-accurate panel that agents ignore is worthless; accuracy at the model says nothing about value at the agent.
First graph an expert opens: panel adoption and AHT change, segmented by agent tenure. The signature of the section-6 failure is adoption and AHT improving for new agents while AHT worsens for tenured agents — the assist is over-surfacing to experts. The second graph: summary-edit rate over time — rising edits mean the auto-draft is drifting and agents are correcting it (or worse, rubber-stamping it).
9) Boundary: where assist shines, where it turns into noise¶
Agent assist shines on knowledge-heavy, policy-driven, high-variance work with a wide skill gap — complex billing disputes, technical support, regulated processes — where the right answer is in a KB the agent can't memorize and consistency matters. It shines hardest for new and ramping agents, who lean on it as a live mentor.
It turns into noise on simple, repetitive, low-variance calls handled by tenured agents — exactly where the section-6 failure lives. Here the surfaced item is rarely new, every glance is pure cost, and over-surfacing slows the experts. The scale limit that inverts intuition: as your agent pool becomes more tenured (which good assist-driven training produces), the assist's value falls unless it learns to suppress — the better your agents get, the quieter the assist should become, which is the opposite of how teams instinctively tune it (they push harder when adoption drops, making it noisier).
10) Wrong assumption: "agent assist is just a worse virtual agent"¶
The seductive idea: agent assist is a stepping stone — a half-built bot that still needs a human babysitter, and the "real" goal is full automation. This misframes it. Agent assist and the autonomous bot solve different problems with opposite control models. The bot owns the conversation under a turn budget; assist owns nothing and spends attention. The bot's failures hit the caller; assist's failures cost a glance. They coexist permanently — the bot handles the easy 60%, assist makes humans better on the hard 40% that should never be automated.
Replace it with: agent assist is a distinct, permanent product — augmenting the human — not a transitional half-bot. The high-emotion, high-stakes, high-variance calls (chapter 01's pathology zone for front-door bots) are exactly where you want a human, made faster by assist. This reframing changes investment: you don't sunset assist as automation grows; you aim it at the calls automation should never touch.
11) Other ways agent assist bites¶
- Over-surfacing to experts — every call shows the obvious; tenured agents slow down and stop looking (section 6).
- Stale knowledge — the KB article surfaced is outdated; the agent repeats a retired policy as a commitment.
- Hallucinated advice — free-generated (not retrieved) guidance invents a policy the agent voices to the customer.
- Mistimed surfacing — the right article appears after the agent already handled the issue, pure distraction.
- Summary rubber-stamping — agents confirm auto-drafts without reading; wrong summaries pollute the CRM and analytics.
- Sentiment false alarms — the supervisor gets pinged on calm calls; alert fatigue, real escalations missed.
- Transcript lag — the live transcript trails the conversation, so surfaced items reference what was said 10 seconds ago.
- PII on the panel — the transcript or surfaced record shows a card number or SSN the agent shouldn't see unmasked (chapter 08).
12) Pattern transfer¶
- Attention is a scarce pot, like the turn budget — agent assist spends operator attention the way chapter 04 spends the turn budget: every addition draws from a fixed pool, and past a small number, more items give negative return. Same constraint shape (a bounded resource consumed by every addition), different resource.
- Surface-one-thing is rate limiting / admission control — structurally identical to load-shedding under backpressure: you can't admit every relevant item, so you admit the highest-priority one and drop the rest. The panel is an admission controller for the agent's attention.
- Measure at the agent, not the model — same as measuring a system by user-perceived outcomes (SLOs) rather than internal component metrics. Model accuracy is a component metric; adoption and AHT are the SLOs. A green component dashboard with a failing SLO is the recurring trap (chapters 02, 04).
13) Design test¶
- Does the assist surface one ranked item at the moment of need, or a wall of relevant-but-untimely options?
- Is anything the agent might repeat to the customer grounded in approved content, or free-generated?
- Do you measure adoption, AHT, FCR, and ramp time at the agent — segmented by tenure — not just model accuracy?
- Does the assist suppress the obvious for tenured agents instead of over-surfacing on routine calls?
- Do agents review auto-summaries, and do you track the summary-edit rate so drift surfaces?
Where this appears in production¶
- Salesforce Einstein (Service Cloud Voice) — analyzes the live transcript to surface knowledge articles, next-best-actions, and sentiment shifts; auto-summarizes after disposition into the contact record.
- NICE Enlighten (CXone) — real-time interaction guidance and next-best-action surfaced to agents live.
- Google Cloud Agent Assist (CCAI) — live transcript, suggested articles, and smart replies for human agents.
- Amazon Connect agent assist (Q in Connect) — real-time recommendations and answers pulled from connected knowledge sources during the call.
- Five9 Genius / Agent Assist — live guidance and after-call summary for Five9 agents.
- Cresta — real-time coaching and next-best-action tuned on a contact center's own winning behaviors.
- Cognigy / Kore.ai agent assist — live knowledge and action surfacing alongside the agent desktop.
- Genesys Agent Assist — knowledge surfacing and summarization in the Genesys agent workspace.
- Zendesk AI (agent copilot) — surfaces suggested replies and knowledge in the ticketing desktop.
- Talkdesk Copilot — real-time assist and automated after-call work.
- Sentiment-to-supervisor alerting — routes at-risk live calls to a supervisor before they go fully wrong.
- Auto-summary / wrap-up drafting — drafts the disposition the agent confirms, cutting after-call work at scale.
- Knowledge-base retrieval (RAG over the help center) — grounds surfaced answers in approved articles, not free generation.
- New-agent ramp programs — assist used as a live mentor so week-one agents perform closer to tenured ones.
Recall¶
- How does the AI's role change from chapter 04 to agent assist, and what scarce resource replaces the turn budget?
- Why does surfacing every relevant item make a live agent slower, not faster?
- What are the four things agent assist surfaces, and what triggers each?
- Why must anything the agent might repeat to the customer be grounded, not free-generated?
- Why is model retrieval accuracy a misleading success metric, and what should you measure instead?
- Why can good assist slow down tenured agents, and what's the fix?
- Why is agent assist a permanent product rather than a transitional half-bot?
Interview Q&A¶
Q1. Your agent-assist panel has 90% retrieval accuracy but agents say it's useless. What's wrong? Accuracy at the model is not value at the agent. The panel is probably surfacing too much, too often, or at the wrong moment, so it competes with the customer for the agent's attention and they route around it. Measure adoption, AHT, FCR, and ramp time at the agent, and fix ranking and timing — surface one item at the moment of need — before touching retrieval accuracy. Common wrong answer to avoid: "improve retrieval to 95%" — the problem isn't relevance, it's that a relevant-but-untimely wall of options costs more attention than it saves.
Q2. Should agent assist generate advice with an LLM or retrieve from the knowledge base? Retrieve and rank from approved content for anything the agent might act on or repeat to the customer — a hallucinated policy the agent voices becomes a wrong company commitment. Use the LLM for ranking what to surface and for drafting the post-call summary, but ground the surfaced facts in the KB with provenance the agent can verify. The test: could the agent repeat this to the customer? Then it must be grounded. Common wrong answer to avoid: "let the LLM freely advise, it's more flexible" — flexibility that invents policy is a liability when the agent repeats it as a commitment.
Q3. After rollout, new agents are faster but tenured agents are 8 seconds slower per call. Diagnose it. The assist over-surfaces to experts. For a tenured agent on a routine call, the surfaced item is rarely new information — it's a thing to glance at and dismiss every call, costing attention off the customer. Fix: suppress the obvious for experts, make routine-call assist pull-not-push, and segment the rollout (heavy for new agents, light for tenured). It's the same "spend the scarce resource only when it pays" rule as not routing every turn through the LLM. Common wrong answer to avoid: "push the panel harder so they adopt it" — pushing harder makes it noisier and slows experts more; the answer is suppression, not insistence.
Q4. Why start a new center's AI journey with agent assist instead of an autonomous bot? Lowest blast radius. The human stays in control and catches every AI mistake, so a wrong suggestion costs a glance, not a misrouted call or a leaked card. It deploys faster, builds organizational trust, and generates a labeled dataset of good interactions you can later use to train a front-door bot. Then graduate to autonomous handling on the high-volume, low-variance call types. Common wrong answer to avoid: "start with the autonomous bot, it deflects more" — highest deflection is highest blast radius and the riskiest first move on an unproven stack.
Q5. As your agents get more tenured over time, assist adoption drops. Is the assist failing? Not necessarily — it may be succeeding into its own irrelevance. Assist value concentrates in new and ramping agents; as the pool tenures (partly because assist trained them), the surfaced items are increasingly obvious and adoption naturally falls. The fix is to make the assist quieter for experts and aim it at the hard, high-variance calls and the newest cohort — not to push harder. The better your agents get, the quieter assist should become. Common wrong answer to avoid: "adoption dropped so the assist is broken, crank up surfacing" — that inverts the right response and slows your now-expert agents.
Q6. The auto-summary feature is live, but CRM data quality is getting worse. How is that possible? Agents are rubber-stamping the auto-drafts without reading them, so summarization errors flow straight into the CRM as wrong dispositions — which then poison chapter-06 analytics. The auto-summary relieves typing but creates a review burden; if agents skip the review, an un-verified wrong summary is worse than a hand-typed one. Track summary-edit rate; a rate near zero means agents aren't reviewing, not that the drafts are perfect. Common wrong answer to avoid: "the summarization model regressed" — the drafts may be the same quality; the data degraded because the human review step collapsed.
Q7. (Cumulative) A transferred call's assist panel shows the caller's full card number to the human agent. Whose problem is this? A composition of agent assist (ch 05), the transcript/perception layer (ch 03), and compliance (ch 08). The card number should never have reached the transcript or the panel unmasked — it should have been captured via a PCI-safe DTMF path (ch 08) that keeps it out of the transcript entirely, so there's nothing for assist to surface. The assist is showing it because an earlier layer let cardholder data into scope. Fix the capture path, not the panel. Common wrong answer to avoid: "just redact it on the assist panel" — redacting at the display is too late; the card already entered the transcript and recording, expanding PCI scope to everything that touched it.
Design/debug exercise (10 min)¶
Step 1 — Modeled example. Walk the escalated-dispute assist (section 3, Attempt B): transcript scrolls quietly → customer says "duplicate charge on the third" → engine surfaces one card (refund clause + pre-filled $59 credit action) → agent confirms → auto-summary drafted at call end. For each surfaced item, write the trigger (what in the transcript fires it) and the one failure if it surfaces too early or too late.
Step 2 — Your turn. A different escalated call: "I'm cancelling because your service has been down for three days." Design the assist: what does it surface and when (retention offer? outage-credit policy? — and at which transcript moment), what stays suppressed for a tenured agent, and what does the auto-summary draft? Note where you'd measure adoption and AHT to know it's helping.
Step 3 — Reproduce from memory. Redraw the "second pair of eyes" diagram (section 2) cold, contrasting it with chapter 04's autonomous bot (AI is the voice / AI is the eyes; owns turn budget / owns nothing, spends attention). Then connect to chapter 03: show that the same live ASR feeds a ranking-and-timing problem here instead of a turn-taking one.
Operational memory¶
This chapter explained why the safest, most reliable AI in a contact center never speaks to the caller: it sits beside a human, listens to a live call it can't control, and tries to make the human faster without getting in the way. The important idea is that the scarce resource is now operator attention, not the turn budget — so the AI must surface one ranked item at the moment of need and stay silent otherwise, because every surfaced thing competes with the customer for the agent's mind.
You learned that agent assist surfaces a live transcript (always on, to free attention), knowledge and next-best-action (event-triggered), and an auto-summary the agent confirms — all grounded in approved content so the agent can safely repeat it. That solves the opening firehose failure because the failure was never weak retrieval; it was spending attention the agent couldn't spare.
Carry this diagnostic forward: when an assist "isn't used," look at adoption and AHT segmented by tenure before retrieval accuracy. When tenured agents slow down, the assist is over-surfacing the obvious — suppress, don't push. When CRM quality drops despite auto-summary, check whether agents are actually reviewing the drafts.
Remember:
- The scarce resource is operator attention; surface one thing at the right moment, dismissible, never a wall.
- Ground anything the agent might repeat to the customer; free-generated advice can invent a policy.
- Measure at the agent (adoption, AHT, FCR, ramp), segmented by tenure — not at the model.
- The better your agents get, the quieter assist should become; suppress the obvious for experts.
- Agent assist is a permanent product augmenting humans on the calls automation should never touch, not a half-bot.
Bridge. Whether the AI drove the call (chapter 04) or assisted a human (chapter 05), every call now leaves behind a transcript and a recording. The live pressure is gone — but a new one appears: at 40,000 calls a day, all of that has to be transcribed, scored, mined, and compliance-checked in bulk, where the constraint is no longer latency but scale, cost, and data quality. → 06-post-call-analytics.md