04. PII detection and redaction — the redaction tray removes contraband before anyone travels¶
~14 min read. Sensitive data leaks in by accident and leaks out by negligence.
Built on the ELI5 in 00-eli5.md. The redaction tray — the place where risky items are removed — matters because privacy failures happen on both the way in and the way out.
PII is a two-way safety problem¶
Most teams first notice privacy risk on input. A user pastes a phone number, account number, PAN, or medical note. Good catch. But look again. Output can leak too. Logs can leak too. Retrieval indexes can leak too.
So what to do? Treat PII like contraband moving through the airport. The redaction tray should sit before storage, before retrieval indexing, before model calls when possible, and again before the answer leaves arrival customs.
incoming text
│
▼
┌──────────────────┐
│ redaction tray │ detect PII entities
└───────┬──────────┘
├── masked copy ──→ model / logs / index
└── secure vault ─→ original value if business needs it
model output
│
▼
┌──────────────────┐
│ arrival customs │ detect leaked PII again
└───────┬──────────┘
├── pass safe output
└── block or mask leaked values
See the reason for two passes. The model may copy the original secret back. A retrieval chunk may contain an old address. A tool response may include more than the user should see. One pass is helpful. Two passes are safer.
Now what counts as PII? Names alone are tricky. But email, phone, card number, bank account, passport number, PAN, Aadhaar-like identifiers, home address, medical record numbers, and combinations of quasi-identifiers all matter in practice.
Detection methods: regex, dictionaries, NER, and specialized tools¶
No single detector is enough. Use layered detection, just like other guardrails.
Regex is fast and precise for known formats. Credit cards, SSNs, PAN-like IDs, dates, email addresses, IPv4 addresses, and phone numbers are good candidates. The weakness is obvious. Regex only catches what has a stable visible shape.
Dictionary matching is useful for known customer names, internal project names, or high-risk terms. It works well when your domain has controlled vocabularies. It also creates false positives if used carelessly.
NER-style entity models catch softer patterns. They can recognize names, locations, organizations, and sometimes medical or financial entities even when formatting is irregular. The tradeoff is more misses and more false alarms than crisp regex rules.
Specialized frameworks combine several methods. Microsoft Presidio is the common example in production demos. It lets teams run recognizers for email, phone, IBAN, credit card, and custom entities, then choose masking operators. That is practical engineering, not theoretical purity.
A useful mental picture is this.
text chunk
│
├── regex recognizers ───────→ exact formats
├── dictionary recognizers ──→ known words and IDs
├── NER recognizers ─────────→ fuzzy entities
└── policy rules ────────────→ who may see what
│
▼
merge + score + redact
Simple, no? Fast rules catch easy cases. Softer models catch messy cases. Policy decides whether to mask, hash, tokenize, or allow.
Worked example: support chat with mixed identifiers¶
Suppose a user sends this message.
"Hi, I am Riya Sharma. My phone is +91 98765 43210. My PAN is ABCDE1234F. I was charged twice on card 4111 1111 1111 1111. Please fix account AC-99281."
Now walk it through the redaction tray.
Step one: regex finds the phone, PAN, and card number. Step two: dictionary or account rules detect the internal account ID. Step three: a name recognizer marks Riya Sharma as a personal name.
A masking result might look like this.
Original
Riya Sharma / +91 98765 43210 / ABCDE1234F / 4111 1111 1111 1111 / AC-99281
Masked for model
[NAME] / [PHONE] / [PAN] / [CARD] / AC-99281
Notice one deliberate choice. We kept AC-99281. Why?
Because the tool may need an internal account handle,
and it is not automatically personal data if your
policy treats it as a safe operational key. That
decision belongs to policy, not only to detection.
Now what is the output risk? Suppose the model replies, "I found your duplicate charge on card ending 1111 under PAN ABCDE1234F." Bad move. The arrival customs layer should catch that leaked PAN and either mask it or block the message.
That gives us a simple rule. Detect on ingress. Detect again on egress. Keep one safe working copy. Keep originals only in a secure vault or authorized system.
Masking strategy is a product decision, not just an ML trick¶
Teams say, "We will redact." Good. Redact how?
Full masking removes the value entirely. Example: 4111
1111 1111 1111 → [CARD]. This is safest for many
prompts and logs.
Partial masking preserves some utility. Example: 4111
1111 1111 1111 → **** **** **** 1111. This helps
support workflows confirm identity without exposing the
full number.
Tokenization replaces the value with a stable token.
Example: riya@example.com → <EMAIL_42>. Later systems
can rehydrate through a secure mapping service if
policy allows. This is useful when the model must refer
consistently to the same entity across several turns.
Hashing helps for join operations or deduplication, but it is not user-friendly for conversation. Also, unsalted hashes of common PII can be reversible by guessing. So use with care.
The redaction tray should know the business purpose. A summarization assistant may need aggressive masking. A fraud investigation assistant may need reversible tokens for authorized reviewers. The same detector can feed different operators.
Storage, retrieval, and evaluation matter as much as detection¶
Now what is the common mistake? Teams mask the live prompt but forget the logs, embeddings, and analytics sink. Then the data is still everywhere.
Mask before you log by default. Mask before you index for retrieval unless business policy requires otherwise. If you store originals, isolate them with strong access controls and clear retention rules.
Evaluate the detector like any other production subsystem. Measure precision and recall by entity type. A phone detector can be strong while a name detector is weak. A healthcare bot may care more about diagnosis terms than personal names. Product context matters.
The control tower should watch privacy metrics too. Spikes in detected PII, rising output-leak counts, or repeated unmasked tool fields are incident signals. Privacy is not only a preprocessing function. It is an ongoing operating discipline.
See. PII redaction is not about making the text ugly. It is about reducing harm while preserving enough task utility to keep the system useful.
Where this lives in the wild¶
- Microsoft Presidio deployments — privacy engineer: combine regex and recognizers to anonymize support tickets before LLM summarization.
- Microsoft 365 Copilot with Purview — compliance architect: applies sensitivity labels and DLP policies before enterprise content flows into generated answers.
- Twilio contact-center AI — customer operations engineer: masks caller phone numbers and payment details before transcripts reach analytics or assistants.
- Nuance DAX medical scribe workflows — healthcare security lead: must protect names, dates, and record identifiers while still generating clinically useful notes.
- Intercom support summarization — trust engineer: needs outbound scans so generated summaries do not echo full card or bank details back to agents.
Pause and recall¶
- Why is PII redaction a two-way problem, not only an input problem?
- What are the strengths and weaknesses of regex versus NER for PII detection?
- When would partial masking be better than full masking?
- Why is masking prompts alone insufficient for real privacy protection?
Interview Q&A¶
Q: Why combine regex with NER instead of choosing one approach globally? A: Because crisp identifiers like cards and emails benefit from exact patterns, while names and messy entities need softer contextual detection. Common wrong answer to avoid: "Because NER is only for languages other than English."
Q: Why tokenize some identifiers instead of always deleting them completely? A: Because some workflows need stable references across steps, and reversible tokens preserve utility without exposing the raw secret broadly. Common wrong answer to avoid: "Because tokenization automatically makes the data anonymous in every legal sense."
Q: Why scan outputs again if inputs were already redacted? A: Because tools, retrieval, or cached context may still contain sensitive values, and models can reconstruct or echo them in final responses. Common wrong answer to avoid: "Because output scanning is only useful when the model is malicious."
Q: Why evaluate PII detection by entity type instead of one global accuracy number? A: Because operational risk differs sharply across entities, and one strong category can hide dangerous weakness in another. Common wrong answer to avoid: "Because precision and recall do not apply to privacy systems."
Apply now (5 min)¶
Exercise. Write one fake customer message with five identifiers. Include an email, a phone number, a payment-like number, an internal account ID, and one personal name. Decide which values to fully mask, partially mask, or tokenize in the redaction tray.
Sketch from memory. Draw the two-pass path. First pass before logs and the model. Second pass at arrival customs before the answer goes out. Add one note on when a secure vault is needed.
Bridge. Input data may now be clean. But the model still has one more way to break downstream systems: by returning outputs in the wrong structure. So next we harden the output passport desk. → 05-output-schema-validation.md