08. What the AI is allowed to hear, store, and record¶
~18 min read. The bot can verify, look up, transfer, and log. Now a caller reads a card number aloud — and in that instant the question stops being "is the bot correct?" and becomes "is this legal?" A card number in a recording is a reportable breach. A recorded call in a two-party-consent state without a disclosure is a wiretap. The constraint here isn't latency or coordination. It's law, and it constrains everything the earlier chapters built.
Built on 07-crm-cti-and-systems-integration.md. Chapter 07's auth gate decided who the AI may expose data to. This chapter decides what the AI may hear, store, and record at all — the blast radius named in chapter 01 at its most extreme, the PCI scope that decides whether a card number can structurally reach your systems, and a new pressure: audit. These constraints reach backward and re-bind chapters 02, 03, and 06.
Note: this is the chapter the running example has been deferring since chapter 00, where "nobody wired a PCI-compliant DTMF path, so the card numbers are now sitting in plaintext call recordings — a reportable breach." Here we build the path that makes that breach structurally impossible, and the consent and redaction layers around it.
What every earlier layer was allowed to do, and the law it never checked¶
Every chapter so far optimized for correctness and speed. The bot forked audio (chapter 02) so it could hear the call. ASR transcribed every word (chapter 03). Analytics stored and mined every transcript (chapter 06). The CRM logged every disposition (chapter 07). Each of those is a recording, a store, or a transmission of someone's words — and the law has rules about all three that no earlier chapter checked.
Two of those rules are sharp enough to end a deployment. First, recording consent: in twelve US states plus DC, recording a call without all parties' consent is a crime, and if any party is in the EU, GDPR demands explicit consent regardless of where your servers sit. The bot that records every call to feed chapter 06's analytics is committing a wiretap the moment a Californian calls and isn't told. Second, cardholder data: the moment a caller speaks a card number into a call the AI is forking and recording, that number lands in the audio, the transcript, and the analytics store — and every system that touched it falls under PCI DSS, the standard that governs cardholder data. A card number in a recording isn't a bug; it's a reportable breach and a failed audit.
So the pressure here is not correctness and not speed. It is blast radius at its legal maximum and a new one, audit: can you prove, after the fact, that consent was obtained, that the card number never entered your systems, that PII was redacted, and who accessed what? The earlier chapters built capability. This chapter builds the constraints that keep that capability legal — and they reach backward: redaction re-binds chapter 06's store, consent re-binds chapter 02's recorder, PCI scope re-binds chapter 03's transcript.
By the end you can reason about recording-consent law across jurisdictions, build the PCI-safe payment path that keeps a card number structurally out of scope, redact PII before it's stored, and produce the audit trail that proves all of it.
What this file solves¶
A contact-center AI that records every call to feed its analytics, hears a spoken card number, and stores the raw transcript is — without anyone intending it — committing a wiretap in two-party-consent states, holding cardholder data in plaintext, and unable to prove who accessed what. This file shows how to obtain and log recording consent across jurisdictions, build the DTMF-masked payment path that keeps card numbers out of the audio, transcript, recording, and CRM entirely (so those systems fall out of PCI scope), redact residual PII before storage, and keep the audit trail that makes all of it provable.
Why "record everything and redact later" is a breach, not a control¶
The obvious build: record every call (analytics needs the audio), transcribe it, and run a redaction pass to scrub card numbers and PII before anyone looks. It feels responsible — there's a redaction step, after all. A diligent team ships exactly this.
It fails the moment a caller speaks a card number, and it fails structurally, not occasionally. The card number enters the live audio the instant it's spoken. Chapter 02's recorder captures it. Chapter 03's ASR transcribes it. It's written to the recording store and the transcript store before any redaction runs. Redaction-after-the-fact means the card number already existed in plaintext in your recording, your transcript, your backups, and possibly your analytics pipeline — and PCI scope is defined by every system that stores, processes, or transmits cardholder data. You didn't avoid scope by redacting later; you pulled your entire telephony, recording, transcript, and CRM stack into scope, then deleted some of it. A breach of a system that briefly held a card is still a breach.
The same structural failure hits consent. "Record everything, sort out consent later" means you recorded a two-party-consent-state caller without their consent — the recording itself is the violation, and deleting it afterward doesn't un-commit the wiretap.
So the real problem is not "our redaction isn't good enough" and not "we need a better scrubber." It is that redaction-after-capture and consent-after-recording both let the sensitive thing exist before the control runs — and for cardholder data and wiretap law, the violation is the existence, not the retention. How do we make the card number, and the unconsented recording, never exist in our systems in the first place?
That question flips the whole design from cleanup to prevention. Get consent before the recording starts. Keep the card number out of the audio the AI ever hears, so there's nothing to redact. Redact residual PII before storage, not after. PCI's whole game, as the standards council and every CCaaS vendor frame it, is descope: if cardholder data never enters a system, that system isn't in scope — and your audit shrinks from SAQ D's 329 controls to SAQ A's 22.
Rule: keep the sensitive thing from ever existing in your systems, and prove it¶
The load-bearing rule of compliance: obtain consent before recording, keep cardholder data out of every system the AI touches so those systems fall out of PCI scope, redact PII before storage rather than after, and log enough to prove all of it — because for wiretap and cardholder-data law, the violation is that the sensitive thing existed, not that you kept it. Prevention and provability, not cleanup.
Why this rule exists. The primitive is that recording, storing, and transmitting are each regulated acts. The constraints: a recording made without consent is already a wiretap; a system that ever held a card number is already in PCI scope; PII in a stored transcript is already a privacy exposure; and a regulator will ask you to prove none of that happened. Cleanup can't undo existence — a deleted card number was still stored, an erased recording was still made. So the rule moves every control before the sensitive thing exists, and adds an audit trail because "we didn't violate" is worthless if you can't demonstrate it.
1) Recording consent — the law that re-binds chapter 02's recorder¶
The moment chapter 02 forked the call audio to the recorder, you started recording a conversation — and that act is governed by where the parties are, not where your servers are.
WHO IS ON THE CALL? WHAT THE LAW DEMANDS
┌───────────────────────────┐ ┌──────────────────────────────────┐
│ all parties in 1-party │──▶│ one party's consent (the company │
│ states (federal baseline) │ │ counts) — disclosure still wise │
├───────────────────────────┤ ├──────────────────────────────────┤
│ ANY party in a 2-party │──▶│ ALL parties must consent │
│ state (CA, FL, IL, MD, … │ │ → "this call may be recorded" │
│ 12 states + DC) │ │ disclosure + ability to opt out │
├───────────────────────────┤ ├──────────────────────────────────┤
│ ANY party in the EU │──▶│ GDPR: explicit, specific, informed│
│ │ │ consent + a lawful basis │
└───────────────────────────┘ └──────────────────────────────────┘
Cross-jurisdiction rule: when parties are in different states,
apply the STRICTER law. So a multi-state center records as if 2-party.
Federal law sets a one-party baseline, but twelve states — California, Florida, Illinois, Maryland, Michigan, Montana, New Hampshire, Pennsylvania, Vermont, Virginia, Washington, West Virginia — plus DC require all parties to consent. When the agent and the caller are in different states, the stricter rule governs. A national contact center can't know in advance which state a caller is in, so the safe design is to treat every call as two-party: open with "this call may be recorded for quality and training," give a real way to decline, and log the consent against the call. If any caller might be in the EU, GDPR layers on a requirement for explicit, specific, informed consent with a documented lawful basis.
The AI's role: the consent disclosure is the bot's first utterance, before any recording-dependent processing, and the consent decision is written to the call record (chapter 07's disposition store) as part of the audit trail. For the billing line, the bot's greeting carries the disclosure, the consent is logged with a timestamp, and if the caller declines, the recording path is disabled for that call — the recorder from chapter 02 is gated on a logged consent, not on by default.
Teacher voice. Consent is not a checkbox you tick once in the admin console. It is a per-call fact you obtain at the start and can prove afterward. "We always play the disclosure" is not provable; "here is the timestamped consent logged against call X" is. The audit is the difference between a policy and a defense.
2) Picture: the card number that never enters the building¶
The mental model for PCI: the contact center is a building, and PCI scope is "every room the card number walks through." Redaction-after-capture lets the card walk through every room and then mops the floors. The right design builds a sealed pipe from the caller's keypad straight to the payment gateway — the card never enters any room the AI, the recorder, the transcript, or the CRM lives in.
┌──────────────────── YOUR CONTACT CENTER (out of scope) ──────────────────┐
│ │
│ AI / bot ──── recorder (ch02) ──── ASR (ch03) ──── CRM (ch07) │
│ ▲ ▲ ▲ ▲ │
│ │ hears FLAT TONES only — never the digits │
│ │ │
└──────┼────────────────────────────────────────────────────────────────────┘
│
CALLER ── types card on keypad ──▶ ┌──────────────────────────────┐
(DTMF tones) │ DTMF MASKING (in scope: │
│ the ONLY thing that touches │ ──▶ PAYMENT
│ the digits) — masks tones, │ GATEWAY
│ routes digits to gateway │ (Level 1 SP)
└──────────────────────────────┘
The card number's entire journey: keypad → masking → gateway.
It never enters the AI, recorder, transcript, or CRM → those are OUT of scope.
The card number's whole life is keypad → masking layer → payment gateway. The AI, recorder, ASR, and CRM hear flat tones, never the digits. Because none of those systems ever store, process, or transmit the card number, none of them are in scope — the audit collapses from SAQ D (329 controls across your whole stack) to SAQ A (22 controls), the ~96% scope reduction a clean masking deployment is built to produce. The sealed pipe is the entire game: keep the card out of the building, and the building isn't a PCI problem.
3) The running example: taking the payment without ever hearing the card¶
Thread the billing call's payment moment — the exact step that, done wrong, was chapter 00's breach. The caller wants to pay the balance after the dispute.
Attempt A — the caller reads the card aloud¶
The bot says "what's your card number?" The caller reads sixteen digits. The bot's audio fork (chapter 02) records them, ASR (chapter 03) transcribes them, the transcript is stored, and the analytics pipeline (chapter 06) ingests them. The bot then "pauses recording" while the CVV is read — except the digits already spoken are in the recording, the agent (or bot) heard them, and pause-and-resume relies on the pause firing perfectly every time. Under PCI DSS v4.0.1, this is no longer an acceptable control: any recording that retains card data, and any pause that occasionally misses, is a failure. Every system that touched the audio is now in scope, and the card sits in plaintext across recordings, transcripts, and backups. Reportable breach, failed audit.
Attempt B — DTMF capture, masked, descoped¶
When it's time to pay, the bot says "please type your card number on your keypad." The caller types; the digits travel as DTMF tones. The DTMF masking layer intercepts those tones in the live stream, replaces them with flat monotone in everything downstream can hear (the bot, the recorder, the transcript), and routes the real digits straight to the payment gateway — a Level 1 service provider. The bot never hears the number; the recording captures flat tones; the transcript shows [card entered]. The CVV and expiry go the same way. The gateway authorizes and returns a token. The bot confirms "payment of $X received" and logs the token and outcome (chapter 07's disposition) — never the card.
The hard part hiding here: the card number's journey must be physically isolated, not logically scrubbed. DTMF masking works because the digits are intercepted before they reach any system the AI or recorder lives in — the masking layer is the only thing in scope, and it's a hardened, certified component. This is the prevention-over-cleanup rule made concrete: there is nothing to redact because the card never entered the systems that would need redacting. Contrast Attempt A, where the card existed everywhere and you mopped up after.
Mini-FAQ. "Why not just redact the card from the transcript with a good PII model?" Because by the time the transcript exists, the card was already spoken into the audio, recorded, and transmitted to ASR — every one of those systems is now in scope, and the redaction only cleans the transcript, not the recording, the backups, or the audit finding. Redaction handles residual PII (section 4); it cannot descope a card the caller spoke aloud. The only way to keep the card out of scope is to keep it out of the audio — DTMF.
4) Redacting residual PII before storage — re-binding chapter 06's transcript store¶
DTMF handles the card. But calls are full of other sensitive data the caller will simply say: name, address, date of birth, SSN, account numbers, health details. You can't keypad-capture all of it — people answer security questions out loud. So this PII does enter the audio and the transcript. The control here is redaction, but the rule still holds: redact before storage, not after.
The naive alternative — store the raw transcript, redact a copy for display — fails the same way Attempt A did: the raw transcript with the SSN in it exists in your store, in scope and exposed. The fix is a redaction pass between transcription and storage: ASR produces the transcript, a PII-detection model (Amazon Connect Contact Lens redaction, Genesys sensitive-data redaction, Presidio) tags and masks the entities, and only the redacted transcript is written to the store. The raw transcript exists only transiently in memory.
- Redact before storage — the stored transcript never contains the SSN; analytics (ch 06) and assist (ch 05) only ever see masked data. Costs a processing step and risks over-redacting useful context.
- Store raw, redact on display — simpler pipeline, but the raw PII sits in the store in scope; a store breach exposes it, and "we redact on display" is no defense for what's stored.
For a regulated billing line, redact-before-storage wins decisively: the same logic as keeping the card out of the audio, applied to spoken PII. The redaction also feeds chapter 06 — the analytics store from that chapter holds redacted transcripts, which is why a card number found in the analytics store (chapter 06's Q7) means the capture path leaked: redaction should have caught residual PII, and DTMF should have meant there was no card to find.
5) The property that changes the design: scope is decided by what touches the data, not by intent¶
The dimension teams get wrong: they think compliance is about policy and intent — "we have a data-handling policy, we intend to protect cards." PCI scope doesn't care about intent. It is decided mechanically: any system that stores, processes, or transmits cardholder data — plus anything connected to it — is in scope. A card number that passes through a system for one millisecond puts that system in scope, policy or no policy.
Intent-based thinking: "we protect card data" → feels compliant
Scope-based reality: does the card TOUCH this system? → yes = in scope
Attempt A: card touches telephony, recorder, ASR, transcript, CRM, backups
→ ALL in scope → SAQ D, 329 controls, every system audited
Attempt B: card touches ONLY the masking layer + gateway
→ everything else OUT of scope → SAQ A, 22 controls
This asymmetry should drive every design decision: the goal is to minimize the set of systems the card touches, not to protect a large set well. Protecting a 329-control SAQ-D environment is enormously more expensive and fragile than descoping to a 22-control SAQ-A environment — and the descoped environment is structurally safer because a breach of an out-of-scope system can't expose a card it never held. This is why DTMF masking's value isn't "better protection"; it's scope reduction. The surprising consequence: the cheapest and safest path is the one where your AI, recorder, and CRM are deliberately kept ignorant of the card. Making your systems know less makes them safer and your audit smaller — the opposite of the usual "more data is better" instinct.
6) One failure walked through: the audit you couldn't pass even though you did nothing wrong¶
Incident: an assessor reviews the contact center. The card path is clean — DTMF masking, descoped, no card in any recording. Consent disclosures play on every call. Yet the audit fails. Nothing leaked. Why?
The chain: the team could not prove their controls worked. They couldn't produce per-call consent logs (they "always play the disclosure," but there's no record that call X consented). They couldn't show who accessed the redacted transcript store or when (no access logs). They couldn't demonstrate that the masking layer actually masked on a given date (no evidence trail). PCI DSS Requirement 10 demands comprehensive logging and monitoring of in-scope systems; consent law demands provable consent. The controls existed; the evidence didn't. To a regulator, an unprovable control is no control.
The root cause is not weak controls — the controls were strong. It is that compliance is provability, not just correctness: a control you can't demonstrate is a control you don't have. The fix: log everything the audit will ask about — per-call consent with timestamps, access logs on every store holding PII (who, what, when), evidence that masking and redaction ran on each call, retention-and-deletion records. This is the audit pressure: the system must generate its own proof continuously. It's the same shape as chapter 06's validation probe (you must be able to show the metric's error bar) and chapter 07's idempotent write log (you must be able to show exactly one record per call) — provability as a first-class requirement, not an afterthought.
7) Cost movement: what compliance-by-prevention costs and saves¶
Effects of prevention-and-provability versus record-everything-redact-later (illustrative; varies by jurisdiction and vendor):
| What it does | What it saves | What it costs | Who absorbs it |
|---|---|---|---|
| DTMF masking (descope) | SAQ D → SAQ A: 329 → 22 controls, ~96% scope cut | masking-layer license + a keypad step in the flow | the payment moment (one extra turn), vendor fee |
| Consent disclosure + logging | avoids wiretap liability in 2-party states / EU | a few seconds of greeting + a consent log row | the turn budget, the audit store |
| Redact PII before storage | store holds no raw SSN/DOB; breach can't expose it | a redaction pass per transcript; over-redaction risk | compute + some lost context for analytics |
| Audit trail (consent, access, masking evidence) | passable audits; provable controls | continuous logging + retention infrastructure | storage + an SIEM/compliance pipeline |
| Retention + deletion policy | data minimized; less to breach, less to prove | lifecycle plumbing, deletion guarantees | engineering + legal |
The pressure evolution: prevention relieves the breach-and-wiretap risk (the card and unconsented recording never exist) but creates an audit burden — you must now continuously produce proof — absorbed by logging, access controls, and a compliance pipeline. Descoping relieves the audit scope (22 controls beats 329) but creates a dependency on a certified masking layer and a small flow change, absorbed at the payment moment. The net is strongly favorable: a descoped, provable environment is cheaper to run and to audit than a sprawling in-scope one you must protect everywhere and still can't easily prove.
8) Signals that compliance is the failing layer¶
Healthy: zero card numbers in any recording, transcript, or store (spot-checked); per-call consent logged with timestamps; PII-redaction running before storage with a low false-negative rate on a labeled sample; complete access logs on every PII store; an audit scope that's SAQ A, not SAQ D.
First metric to degrade: the rate of sensitive entities found in stored transcripts/recordings. When the masking layer misfires, a redaction model regresses, or someone adds a new "say your card" flow, card numbers and SSNs start appearing in stores that should be clean — and this surfaces before an external breach, if you're scanning your own stores for it (which you must).
Misleading metric people watch: "we have a compliance policy" / "we play the disclosure." A policy is not a control and a played disclosure is not a logged consent. The comforting fact (we intend to comply) says nothing about whether the card actually stayed out of scope or whether you can prove consent on call X.
First graph an expert opens: a continuous scan of all stores for unredacted PII/PAN (should be flat zero; any nonzero is an active scope leak), and consent-log completeness (every recorded call has a logged consent — gaps are wiretap exposure). The second graph: access-log coverage on PII stores (every read attributable to a who/when) — gaps mean the section-6 audit failure is coming. The misleading green dashboard is "redactions performed today"; the number that matters is "sensitive entities that reached storage despite redaction."
9) Boundary: where prevention shines, where compliance gets genuinely hard¶
Prevention-by-descope shines on structured, capturable sensitive data — card numbers, CVV, SSN-as-keypad-entry, account numbers — where you can route the data through an isolated capture path (DTMF) and keep every other system ignorant. Here the design is clean: seal the pipe, descope the building, prove it. This is the strongest, most settled part of contact-center compliance.
It gets genuinely hard with spoken, unstructured, unavoidable PII and cross-jurisdiction calls. A caller will say their name, address, and health details out loud; you can't keypad-capture a complaint about a medical bill. Redaction helps but isn't perfect — it has false negatives, and over-redaction destroys the context analytics needs. And jurisdiction is genuinely ambiguous: you often don't know which state or country a caller is physically in, VoIP makes area codes lie, and the stricter-law rule means one EU caller subjects the call to GDPR. The scale limit that inverts intuition: as your call volume and geographic spread grow, the probability that some call hits a two-party state, an EU resident, or a spoken-PII edge case approaches certainty — so at scale you must design for the strictest applicable regime on every call, because you can't reliably detect which calls need it. The naive "handle the special cases specially" breaks; at scale, the special case is every case.
10) Wrong assumption: "we redact the sensitive data, so we're compliant"¶
The seductive idea: we run a redaction pass that scrubs card numbers and PII, therefore the sensitive data is handled and we're compliant. This conflates redaction with scope and cleanup with prevention. Redaction acts after the data exists — which means for cardholder data, the card already entered your recording, transcript, and backups before redaction ran, pulling all of them into scope. Redaction is the right tool for residual spoken PII, but it is the wrong tool for cardholder data, and it does nothing for consent (you can't redact your way out of having recorded a two-party caller without consent).
Replace it with: descope by prevention (DTMF for cards, consent before recording, redact-before-storage for residual PII), and treat redaction as the last line for unavoidable spoken PII — never as the primary control for cards or consent. This reframing changes the architecture: you don't build a great scrubber and feel safe; you build sealed capture paths so the sensitive thing never enters scope, then redact only what genuinely must be spoken, then prove all of it. It's the same prevention-over-cleanup correction as keeping the card out of the audio — applied to the whole compliance posture.
11) Other ways compliance bites¶
- Pause-and-resume relied on for cards — a missed pause leaves card digits in the recording; under PCI v4.0.1 this is a control failure, not a control (use DTMF).
- Consent not logged — disclosures play but aren't recorded per call; you can't prove consent in a wiretap dispute (section 6).
- Cross-jurisdiction miss — a one-party-state assumption applied to a California or EU caller; the recording is an illegal wiretap.
- Redact-after-storage — raw PII sits in the store in scope; a store breach exposes it despite "redaction on display."
- Redaction false negatives — a card or SSN slips past the PII model into the store, an active scope leak (scan for it).
- Over-redaction — masking destroys context analytics (ch 06) and assist (ch 05) needed; the data is safe but useless.
- Missing access logs — no record of who read the PII store; fails Requirement 10 even with clean data (section 6).
- Retention overrun — recordings/transcripts kept past the policy window; more data to breach and to prove deletion of.
- Backups and analytics in scope — the card reached a backup or the analytics pipeline; descoping the front door but not the copies.
- Deepfake / synthetic-voice consent spoofing — emerging surface where voice-biometric consent or auth is forged (chapter 09).
12) Pattern transfer¶
- Descope = minimize the trust boundary, the least-privilege of data — keeping the card out of every system but the masking layer is structurally the same as minimizing the blast radius of a credential: the fewer systems that hold the secret, the smaller the breach surface. PCI scope reduction is least-privilege applied to cardholder data. Same shape as chapter 01's authority gating: bound what one component can touch.
- Prevention-before-existence = validate-at-the-boundary, not after — redact-before-storage and DTMF-before-capture are the same instinct as rejecting bad input at the edge instead of cleaning a corrupted store later: it's cheaper and structurally safer to never admit the bad thing than to clean it up. The mirror of chapter 06's "redact before the transcript is stored."
- Audit trail = provability as a first-class requirement — the same discipline as chapter 06's validation probe (prove the metric's error) and chapter 07's idempotent write log (prove exactly one record). A control you can't demonstrate is, to a regulator, a control you don't have — the recurring "green dashboard, failing reality" trap, now with legal teeth.
13) Design test¶
- Is cardholder data captured via an isolated path (DTMF masking) so the AI, recorder, transcript, and CRM never touch it — descoping them?
- Is recording consent obtained before recording starts, logged per call, and sized to the strictest applicable jurisdiction (two-party / GDPR)?
- Is residual spoken PII redacted before storage, with the raw transcript never persisted?
- Can you prove every control — produce per-call consent logs, store access logs, and evidence that masking and redaction ran?
- Do you continuously scan your own recording/transcript/analytics stores for unredacted card numbers and PII to catch scope leaks before a breach?
Where this appears in production¶
PCI scope and payment capture
- DTMF masking (Sycurio, PCI Pal, Paytia) — intercepts keypad tones, masks them in the recorded/heard audio, routes digits to the gateway; descopes the contact center from SAQ D to SAQ A.
- PCI DSS v4.0.1 — the standard, mandatory since 31 Mar 2025; pause-and-resume on the agent leg is no longer an accepted control for spoken card data.
- SAQ D → SAQ A descope — 329 controls collapse to 22 when cardholder data never enters your environment (~96% scope reduction).
- Level 1 payment service provider (gateway) — the in-scope endpoint the masked digits route to, keeping your stack out of scope.
- Tokenization — the gateway returns a token your CRM logs instead of the card, so even the stored outcome holds no PAN.
Consent, redaction, and audit
- Two-party-consent disclosure ("this call may be recorded") — the bot's first utterance, gating chapter 02's recorder on a logged consent.
- Per-call consent logging — timestamped consent written to the call record as audit evidence (chapter 07's store).
- Amazon Connect Contact Lens redaction — strips PII, financial account numbers, and PINs from transcripts and audio before storage, across multiple languages.
- Genesys Cloud sensitive-data redaction — redacts PCI entities (PAN, expiry, CVV) and PII from recordings and transcripts.
- Microsoft Presidio / open redaction — entity detection and masking for the redact-before-storage pass.
- GDPR lawful-basis + explicit consent — the regime that attaches the moment any caller is in the EU, regardless of server location.
- PCI DSS Requirement 10 logging / SIEM — comprehensive access and event logs on in-scope systems, the audit-evidence backbone.
- Store-scanning for residual PAN/PII — continuous scans of recording/transcript/analytics stores to catch redaction false negatives and scope leaks.
- Retention and deletion policy — bounded retention windows with provable deletion, minimizing data held and audit surface.
- HIPAA BAA / SOC 2 Type II (e.g., Cognigy, now part of NICE) — the equivalent provable-control regimes for health data and enterprise trust.
Recall¶
- Why is "record everything and redact later" a structural breach for cardholder data, not just a weak control?
- What determines recording-consent obligations — server location or party location — and what's the cross-jurisdiction rule?
- How does DTMF masking keep a card number out of PCI scope, and what's the SAQ D → SAQ A consequence?
- Why must residual spoken PII be redacted before storage rather than on display?
- How does PCI scope get decided, and why does minimizing what touches the card beat protecting a large environment?
- How can a center with clean controls still fail an audit, and what does "compliance is provability" mean?
- At scale, why must you design for the strictest applicable regime on every call rather than handling special cases specially?
Interview Q&A¶
Q1. A caller is about to pay by card. Walk through how the AI takes the payment without creating a PCI breach. The AI never hears the card. It asks the caller to type the number on the keypad; a DTMF masking layer intercepts the tones, masks them to flat tones in everything the bot, recorder, and transcript can hear, and routes the real digits straight to the payment gateway. The gateway returns a token, which the CRM logs instead of the card. Because no system in your environment ever stored, processed, or transmitted the PAN, those systems are out of PCI scope — the audit is SAQ A, not SAQ D. Prevention, not redaction. Common wrong answer to avoid: "record the card then redact it / pause-and-resume the recording" — the card already entered the audio, recording, and transcript before redaction ran, pulling everything into scope; pause-and-resume is no longer an accepted control under v4.0.1.
Q2. Why isn't a good PII-redaction model enough to be PCI compliant? Redaction runs after the data exists. For cardholder data, the card was already spoken into the audio, recorded, transcribed, and possibly backed up before redaction touched it — every one of those systems is now in scope, and you can't redact your way back out. Redaction is the right control for residual spoken PII (names, DOB), applied before storage. It is the wrong control for cards (use DTMF to keep them out of the audio) and does nothing for consent. Descope by prevention; redact only what must be spoken. Common wrong answer to avoid: "redaction scrubs the card, so we're fine" — scrubbing a card that already entered your systems doesn't undo that they held it; the systems are still in scope.
Q3. Your servers are in Virginia and your agents in Texas (one-party). A customer calls from California. Do you need two-party consent? Yes. Consent law follows the parties' locations, not the server's, and when parties are in different states the stricter law governs. California is a two-party (all-party) state, so all parties must consent — play the disclosure and log the consent. Because a national center can't reliably know where each caller is, the safe design treats every call as two-party (and as GDPR if any caller might be in the EU). At scale the special case is every case. Common wrong answer to avoid: "the agent and servers are in one-party states, so one-party rules apply" — the California caller subjects the call to all-party consent; the stricter jurisdiction wins.
Q4. Your card path is clean and consent disclosures play on every call, but you failed the audit. How? You couldn't prove the controls worked. No per-call consent logs (you "always play it," but can't show call X consented), no access logs on the PII store, no evidence that masking/redaction ran on a given day. Compliance is provability: a control you can't demonstrate is, to a regulator, a control you don't have. Fix it by logging consent per call, access on every PII store, and masking/redaction evidence — PCI Requirement 10 and consent law both demand demonstrable proof, not just correct behavior. Common wrong answer to avoid: "but we did everything right, the audit is wrong" — doing it right without evidence fails; the audit tests provability, and unprovable controls don't count.
Q5. Why is descoping (DTMF) framed as cheaper and safer, not just compliant? Because PCI scope is decided mechanically by what touches the card. An in-scope SAQ-D environment is 329 controls across your whole stack — expensive to maintain and audit, and a breach anywhere can expose a card. Descoping to SAQ A is 22 controls, and a breach of an out-of-scope system can't expose a card it never held. Making your systems deliberately ignorant of the card shrinks both cost and breach surface — less data is safer here, the opposite of the usual instinct. Common wrong answer to avoid: "protect the in-scope systems really well" — protecting 329 controls is more expensive and fragile than removing the card so only 22 apply; descope beats defend.
Q6. Where does redaction genuinely struggle, and how do you design around it? On unavoidable spoken PII — names, addresses, DOB, health details a caller says out loud and you can't keypad-capture. Redaction has false negatives (a slipped SSN is a scope leak) and over-redaction destroys context analytics and assist need. Design around it: redact before storage so the raw never persists, scan your own stores continuously for what slipped, tune the false-negative/over-redaction balance against a labeled sample, and accept that spoken PII is a harder, less-settled problem than card descope. It's why store-scanning is mandatory, not optional. Common wrong answer to avoid: "the redaction model is 95% accurate, that's fine" — a 5% false-negative rate on SSNs is thousands of leaked entities at scale; you must scan stores for what slipped past.
Q7. (Cumulative) Chapter 06's analytics store turned up a full card number. Trace which controls failed across chapters. Chapter 08's capture and chapter 02's recorder primarily, surfacing in 06. The card should never have been spoken into audio — it should have gone through DTMF masking (this chapter), so chapter 02's fork and recorder captured only flat tones. Because the card reached the audio, ASR (chapter 03) transcribed it, redaction-before-storage (this chapter, chapter 06 step 1) failed to catch it, and it landed in the analytics store — which is now in PCI scope. The fix is the capture path (DTMF), not the dashboard: descope so there's no card to find, and scan the store to confirm. Common wrong answer to avoid: "redact it from the analytics store now" — it's already stored in scope; deleting it doesn't undo that the recording, transcript, and pipeline held a card. Fix capture, not cleanup.
Design/debug exercise (10 min)¶
Step 1 — Modeled example. Walk the billing payment (section 3, Attempt B): consent disclosure logged at greeting → "type your card on the keypad" → DTMF masking intercepts tones, routes digits to gateway, bot/recorder/transcript hear flat tones → gateway returns token → CRM logs token + outcome, never the PAN → store scanned for residual PAN. For each step, write the one failure if it's skipped (skip DTMF → card in recording; skip consent log → unprovable consent).
Step 2 — Your turn. A new flow: the bot must take a health-related complaint where the caller will speak their condition, DOB, and member ID aloud, and the center spans CA, NY, and one EU caller. Design the compliance layer: what regime governs (consent + GDPR + HIPAA), what's keypad-captured vs spoken, where redaction runs and what it must not over-redact, and what audit evidence you log. Note where store-scanning catches a redaction miss.
Step 3 — Reproduce from memory. Redraw the "card that never enters the building" diagram (section 2) cold — keypad → masking (in scope) → gateway, everything else out of scope — and label the SAQ D vs SAQ A consequence. Then connect to chapter 06: show that the analytics store holds redacted transcripts because redaction ran before storage, and to chapter 02: show that the recorder captured flat tones because DTMF masked the digits upstream.
Operational memory¶
This chapter explained why the most fluent, correct, well-integrated voice AI can still be a reportable breach and a failed audit the moment a caller speaks a card number or a two-party-consent caller isn't told they're recorded. The important idea is that for cardholder data and wiretap law, the violation is that the sensitive thing existed in your systems — so the control must run before it exists (consent before recording, DTMF before the card is spoken, redaction before storage), and you must be able to prove it ran.
You learned to keep the card out of every system the AI touches via DTMF masking — descoping your stack from SAQ D's 329 controls to SAQ A's 22 — obtain and log consent per call against the strictest applicable jurisdiction, redact residual spoken PII before storage, and generate the audit trail (consent logs, access logs, masking evidence) that makes the controls provable. That solves chapter 00's opening breach because the card now never enters the recording, the transcript, or the CRM — there's nothing to leak.
Carry this diagnostic forward: when you think about compliance, ask "does the sensitive thing ever exist in my systems?" not "do I clean it up?" When a card or SSN turns up in a store, the capture path leaked — fix capture, not the dashboard. When controls are clean but the audit fails, you have a provability gap — log what the audit will ask for. And design for the strictest regime on every call, because at scale you can't tell which call needs it.
Remember:
- The violation is existence, not retention: control before the sensitive thing exists, don't clean up after.
- DTMF masking keeps the card out of every system the AI touches — descope SAQ D (329) → SAQ A (22).
- Consent follows party location, not server location; treat every call as two-party / GDPR at scale.
- Redact residual spoken PII before storage; redaction is the wrong tool for cards and useless for consent.
- Compliance is provability — a control you can't demonstrate (per-call consent, access logs, masking evidence) is a control you don't have.
Bridge. We've now walked the entire call — telephony, ASR, orchestration, assist, analytics, integration, and compliance — and at each seam the mechanism worked under its own pressure. But the seams interact, the laws shift, the deepfakes improve, and the metrics lie in ways no single chapter resolved. What's still genuinely open, contested, or breaking at scale — and where the textbook answer and the production reality diverge — is the synthesis the final chapter has to hold. → 09-boundary-tradeoff-review.md