09. Right to be forgotten¶
Retention deletes data at the boundary of its window. The right-to-be-forgotten is the discipline that deletes data on the data subject's request, before the window — and proves it. The workflow has to reach live data, audit, backups, embeddings, and every derived store.
A compliance lead at a Bengaluru SaaS company gets an erasure request from a customer under the DPDP Act: "delete all my data." The team's first reaction is "we can do that from production in an hour." After more careful work, the list of places the customer's data lives turns out to be longer than expected: the primary database, the vector index of past conversations, the analytics warehouse, the eval sample store, two backups, three log streams, and the agent's session memory across an active conversation. Each requires its own deletion path. The team takes three weeks to complete the erasure and one more week to verify. At the end, the customer's data is verifiably gone from every store; the proof is a list of deletions with timestamps and a signed verification record. The next erasure request, two months later, takes four days because the workflow now exists.
This is the right-to-be-forgotten in practice. The legal right is clear; the engineering work is finding every store, deleting cleanly, and proving it.
What the right is¶
The right to be forgotten (erasure right) gives the data subject the ability to request deletion of their personal data, subject to lawful exceptions, with the controller obligated to delete and to prove the deletion across every store.
The right comes from regulation: GDPR Article 17, DPDP Act, CCPA's right to delete, sectoral rules. The exact contours vary; the engineering work is similar.
Exceptions exist — data may need to be retained for fraud prevention, legal hold, regulatory requirement, public interest. Each exception is documented; the deletion workflow respects them but flags exceptions explicitly in the response to the data subject.
The map of where the data lives¶
The first time the workflow runs, the team produces a map. It is reusable across erasure requests; each new tenant or new data type updates it.
For a typical agent platform:
| Store | What it holds | Erasure mechanism |
|---|---|---|
| Primary database | Records keyed on the data subject | DELETE by key; cascade deletes on related rows |
| Vector index | Embeddings derived from the data subject's text | DELETE by metadata filter; rebuild affected partitions |
| Analytics warehouse | Aggregates and event history | DELETE by key; some aggregates may be pre-aggregated (cannot delete individual contribution) — document |
| Sample store (audit) | Full prompt/response for samples involving the subject | DELETE by key; re-anonymise where needed |
| Per-call access audit | Records of every access for/by the subject | Erase or pseudonymise (jurisdiction-dependent — see below) |
| Session memory | Active conversations | End session; delete memory |
| Backups | Mirror of primary stores | Restore-with-skip or full backup re-cycle |
| Logs | Application and platform logs | Pattern-based deletion; or rotate-out at retention boundary |
| Caches | Materialised views, semantic caches | Invalidate by key |
Each store needs its own deletion path; the workflow runs them in sequence and verifies each.
The workflow¶
A reasonable structured workflow.
1. Receive request.
- Authenticate the requester (data subject themselves, or authorised representative).
- Confirm jurisdiction; apply that jurisdiction's exceptions and timeline.
- Open an erasure case with a case_id.
2. Identify the subject's footprint.
- Query each store by subject identifier (user_id, customer_id, email-hash, etc.).
- Catalogue: which rows, which embeddings, which audit records, which samples.
- Apply exceptions: which items must be retained for fraud, legal hold, regulatory minimum.
3. Execute deletions in dependency order.
- Live stores first (primary DB, vector index, caches, session memory).
- Sample and derived stores next.
- Backups last (or rotate out at the next backup boundary).
- Audit log: per the jurisdiction's rule for the audit (often pseudonymise, not delete; see below).
4. Verify deletions.
- Re-query each store by subject identifier.
- Expect zero results (modulo retained-for-exception items).
- Log the verification with timestamps.
5. Notify the data subject.
- What was deleted, what was retained (with the legal basis), and when.
- Provide a case reference.
6. Audit the erasure.
- Every step is itself audited (the erasure is a privileged operation).
Steps 2–4 are the engineering work; steps 1, 5, 6 are the legal and operational wrapper.
The audit-erasure paradox¶
The right-to-be-forgotten requires deletion of personal data. The audit log of past accesses contains identifiers — sometimes the very identifiers being erased. Two principles conflict:
- The data subject wants their data forgotten.
- The platform's regulators and contracts require the audit log for accountability.
Different jurisdictions resolve this differently.
GDPR. Generally, audit records can be retained where necessary for compliance with a legal obligation; the obligation must be documented. Personal data in the audit beyond what is necessary should be pseudonymised.
DPDP. Similar reasoning; the data subject's identifier can often be replaced with a one-way hash post-erasure, preserving the audit's structure without retaining the link to the person.
Sectoral (healthcare, finance). Retention often required for years; deletion may be refused with documented reason; in some cases, anonymisation may be acceptable.
The engineering pattern: at erasure time, replace direct identifiers in the audit with a one-way hash (no reverse mapping retained). The audit's structure is preserved; the link to the now-erased subject is broken. The data subject is notified that audit records have been pseudonymised rather than fully deleted, with the legal basis.
This is a per-jurisdiction policy; consult legal counsel for the specifics.
Backups¶
Backups are the hardest part. A primary-store deletion does not affect existing backups. If a backup is restored later, the deleted data reappears.
Three patterns.
Forward rotation. New backups exclude the deleted data; existing backups age out per the backup retention policy. Until the last existing backup is past its retention, the data could technically be restored. Documented as a known property; usually acceptable under regulation if the backup retention is reasonably short.
Selective skip on restore. Backups are not modified; a "skip list" of erased identifiers is consulted on every restore, excluding their data. Engineering complexity; rarely worth it for most platforms.
Backup re-cycling. Existing backups are deleted and re-taken from the post-erasure live data. Operationally expensive but produces clean backups immediately. Sometimes required for high-stakes regulatory environments.
Most platforms use forward rotation with a short backup retention (30-90 days). The erasure response to the data subject documents that "your data is removed from primary stores; backups older than X days no longer contain it after Y date."
Embeddings and derived models¶
Embeddings derived from the subject's text are personal data in some interpretations (they can be inverted to reconstruct content). The erasure should reach them.
For vector indexes, deletion is straightforward — DELETE by metadata filter pointing to the subject. The index may need re-optimisation after; some engines compact lazily.
For fine-tuned models trained on the subject's data, the situation is harder. A model cannot easily be "untrained." Two patterns:
- Avoid the issue. Do not train production models on personal data. Use anonymised or aggregated data.
- Document and disclose. If models are trained on personal data, the data subject's right to erasure may include re-training the model without their data. This is expensive; treat it as a high-cost operation reserved for high-stakes cases.
Most agent platforms can avoid this by not training on personal data — using the data only for inference and never as training input. The fine-tuning case is exceptional and requires legal alignment.
Verification¶
The erasure is only complete when it can be verified. The workflow's step 4 is the verification:
For each store:
Query for data matching the subject's identifiers.
Expected result: zero (modulo retained items).
If non-zero: investigate; the deletion was incomplete; redo.
Log the verification: store, query, result, timestamp.
For backups:
Confirm the backup retention boundary; record the date past which backups
no longer contain the subject's data.
This date is part of the response to the subject.
The verification produces a record. The record is signed (for high-stakes regulatory environments) and retained as the proof.
The response to the data subject¶
Within the regulatory window (typically 30 days for GDPR, similar for DPDP), the data subject receives:
- Confirmation of erasure
- A list of what was deleted (categorised, not row-by-row)
- A list of what was retained, with the legal basis
- A reference case ID for any future follow-up
The response is courteous, specific, and verifiable. The case ID lets the subject return with questions or audit requests.
Common mistakes¶
Not finding all the stores. The first request reveals stores the team did not know about. The map is a deliverable; subsequent requests are faster.
Not deleting from backups (or not documenting the constraint). Backups are quietly retained beyond the deletion. The data subject's expectation is "gone everywhere"; the engineering reality is "gone from live; backups age out." Disclose.
Not handling the audit-erasure conflict. Either over-delete the audit (losing compliance value) or under-delete (failing the data subject). The pseudonymisation pattern is the usual answer.
Treating each request from scratch. The workflow should be a process the team runs many times. Each request improves the workflow.
No verification. Deletion executed but not verified. A subsequent regulator audit finds the data; the platform's response of "we deleted it" cannot be defended.
How right-to-be-forgotten interacts with the other surfaces¶
- Retention (chapter 06) — RTBF is the explicit erasure that cuts through normal retention.
- Classification (chapter 02) — tier drives the strictness of the deletion path; regulated data needs the strongest verification.
- Audit (chapter 07) — every erasure operation is audited.
- Cross-region (chapter 10) — erasure must reach all regions where the data was processed.
- Incident response (chapter 11) — an erasure that fails or is incomplete is itself a regulatory event.
Interview Q&A¶
Q1. The data subject requests erasure; the audit log contains their identifier. What do you do? Apply the pseudonymisation pattern: at erasure time, the direct identifier in audit records is replaced with a one-way hash. The audit's structure is preserved (you can still query "actions taken on this hash" for compliance); the link to the now-erased subject is broken. Document the jurisdiction's basis for retaining the audit (legal obligation for accountability). Notify the subject that audit records were pseudonymised rather than fully deleted, with the legal basis. Different jurisdictions vary; consult counsel for specifics. Wrong-answer notes: "delete the audit too" loses compliance value; "ignore the audit" fails the right.
Q2. The platform uses backups. How do you handle erasure across backups? Pick a policy. Most common: forward rotation — primary deletion is immediate; existing backups age out per their retention (typically 30-90 days). Disclose to the subject that "your data is removed from live stores; backups older than X days no longer contain it after Y date." Higher-stakes environments may require backup re-cycling (deleting all backups and re-taking) or selective-skip-on-restore (engineering complexity). The choice is documented per platform; the subject is informed. Wrong-answer notes: "we'll delete the backups too" without considering the operational cost or the disclosure framing is incomplete.
Q3. The agent's session memory has an active conversation with the data subject. What happens? End the session; delete the memory. If the conversation is ongoing, the subject's request implies they no longer want to be in conversation with the agent (otherwise they would not request erasure). Confirm with the subject's request channel; close the session; verify the memory is gone. Future conversations from the same subject create new memory (which they may also request to erase later). Wrong-answer notes: "wait until the conversation ends" undermines the right; the subject's request is the authorisation to terminate.
Q4. The team has not run an erasure before. The first request arrives. How do you approach it? Build the map (chapter's section on "where the data lives") explicitly for this request. List every store. Identify the subject's data in each. Plan the deletion order (live first, then derived, then backups). Execute. Verify. Document the workflow as it runs — the next request will use it. The first erasure is expensive (three to four weeks is typical); the second is days; the tenth is hours with full automation. The first is also where the team often discovers stores nobody documented. Wrong-answer notes: "we'll figure it out on the first request" without building the map produces an incomplete erasure and a regulatory event.
What to do differently after reading this¶
- Build the data-location map. Every store. Update with every new data type.
- Define the erasure workflow as a process; document it; verify it on the first request.
- Resolve the audit-erasure conflict per jurisdiction in advance, with legal counsel.
- Set backup retention windows short enough that "data ages out of backups in X days" is a satisfying answer to data subjects.
- Verify every erasure; sign the verification record.
Bridge. Right-to-be-forgotten is the most explicit form of the data subject's control. The next concern is cross-tenant and cross-region — how the discipline holds when the platform serves many tenants in many jurisdictions, with the model gateway routing accordingly. The next chapter brings the two together. → 10-cross-tenant-and-cross-region.md