Skip to content

06. Retention and jurisdiction

PII discipline addresses what content lives in storage. Retention addresses how long it lives, and jurisdiction addresses where . Regulators, contracts, and customer expectations impose limits; the discipline is time-bounded storage with lawful deletion on schedule.


A compliance lead at a Bengaluru SaaS company audits the agent platform's storage. The agent's audit log retains everything for two years; the access logs retain for seven; the prompt-and-response sample store retains forever (nobody set a TTL). Some of the audit log holds data from a customer who exercised their right to be forgotten eight months ago; the customer's primary records were erased but the audit retained references. Some of the sample store holds data classified as regulated by a regulator that requires deletion at five years; it has been there for eight. The platform is, technically, in breach of its own commitments and at least two regulations. Nothing has gone wrong in production; the audit reveals the breach.

This is the retention problem. Data accumulates by default; the discipline is to set explicit windows, enforce them automatically, and verify enforcement. Jurisdiction adds the dimension: regulators in different places impose different rules; the platform must know what applies where.


What retention discipline is

Retention discipline assigns each data category a maximum lifetime, supported by automatic deletion at the boundary, with provable enforcement and per-jurisdiction handling.

Three components.

Per-category retention windows. Each data type — primary records, audit logs, access logs, sample stores, embeddings, backups — has its own window. The window is driven by regulation, contract, and business need.

Automatic deletion at the boundary. Records past their window are deleted by an automated process, not by human attention. Human attention is unreliable; the boundary is enforced by code.

Provable enforcement. Regular audits demonstrate that retention windows are honoured. "We have a policy" is not enough; "here is evidence the policy is enforced" is.


The retention matrix

A typical multi-tenant agent platform has many retention windows. A matrix helps.

Data Tier Window Source of window Jurisdictional variation
Primary records (customer profile) sensitive Active + 7y Business + regulation GDPR: customer can request erasure
Audit log (per-call) metadata 1y standard / 7y regulated Regulation by sector Healthcare 10y; finance 7y; SaaS 1-2y
Application logs metadata 30-90 days Operational; standard practice Some sectors require longer for security
Sample store (full prompt/response) sensitive 90 days Eval + investigation need Tighten where regulated data appears
Embeddings derived; tier per source Same as source Indirect identifier Same as source
Backup snapshots mirror of source Source window + recovery overlap Operational Same as source
Conversation memory sensitive Session + 30 days Product UX Per-tenant policy
Cache derived Hours to days Operational Same as source
Eval set sensitive (synthetic) Indefinite (synthetic) None for synthetic Real PII forbidden

Three patterns emerge.

  • The longer windows are usually for regulated data; shorter for operational data.
  • Derived data inherits the source's window (with some exceptions; embeddings carry effective re-identification risk that mirrors the source).
  • Backups must align with the policy; a "we deleted from production but the backup has it" stance fails the regulatory test.

Where windows come from

Three sources, in order of binding force.

Regulation. GDPR Article 5 (storage limitation), the DPDP Act (data principal rights), HIPAA (medical records retention), PCI-DSS (cardholder data), sectoral rules. Some specify minimum retention (financial transaction records must be kept at least 7 years); some maximum (PII must be deleted after its purpose is fulfilled). Often both at once.

Contract. Customer agreements often specify retention. "Your data is kept for X years; you can request deletion at any time." The contract creates a duty.

Business need. Some retention is operational — for debugging, for analytics, for product features (e.g., conversation memory). Business needs justify retention only within the bounds set by the higher sources.

The platform's policy is the intersection: at least the regulatory minimum, no more than the regulatory or contractual maximum, sized within those bounds by business need.


Implementing automatic deletion

For each data type, a deletion job runs on a schedule (daily for short windows, weekly for longer). The job:

1. Query for records older than the window.
2. Delete them.
3. Verify the deletion (subsequent query returns no rows).
4. Log the deletion event (count, type, window).
5. Update the retention dashboard.

The deletion is not "soft delete" — the data is removed from the queryable store. For some data types, retention deletes lead to backup deletions too (backups carrying past-window data is a violation; the backup retention is part of the policy).

Two common patterns:

TTL on storage. Some stores natively support time-to-live (DynamoDB, Redis, some MongoDB). The TTL is set per record; the store deletes automatically. Lowest overhead.

Scheduled deletion job. A batch process runs at a cadence and deletes past-window records. Works for stores without native TTL.

The choice depends on the store; the discipline is that deletion happens, not how it happens.


Jurisdiction

Different jurisdictions impose different rules. A multi-region platform handles this by routing data to its right jurisdiction and applying that jurisdiction's policies.

The four common dimensions:

Dimension Examples
Residency GDPR (EU data in EU); DPDP (sensitive data in India); state-level US
Retention min/max Healthcare 10y; financial 7y; PII as-short-as-needed
Data subject rights GDPR right to erasure; DPDP right of correction; CCPA right to know
Cross-border transfer EU adequacy regime; SCCs for transfers; explicit consent in some sectors

For an agent platform, the residency is enforced by the gateway (02_ai_infrastructure/01 chapter 10); the retention is enforced by this chapter's discipline; the data subject rights are enforced by chapter 09 (right to be forgotten). The cross-border transfer is the interaction between residency and the data flow — covered in chapter 10.


Jurisdiction tagging on records

Records carry a jurisdiction label at write time. The label sources from the tenant's configured residency:

tenants:
  acme-corp:
    jurisdiction: in
    retention_policy: in-policy-v3

  globex-eu:
    jurisdiction: eu
    retention_policy: eu-policy-gdpr-v2

Each record stored on behalf of a tenant carries the jurisdiction. The deletion job consults the jurisdiction-specific policy when computing the deletion date.

Records for a tenant must not change jurisdiction over their lifetime — that would be a data transfer event with its own regulatory implications. If a tenant migrates jurisdictions (e.g., a customer moves their primary entity to a different country), the migration is an explicit operation with audit, not a silent re-tag.


Proving enforcement

A regulatory audit asks "show me that you deleted X at the right time." The proof is the retention dashboard and the deletion log.

+-----------------------------------------------------------------+
|  Retention status — last 30 days                                |
+-----------------------------------------------------------------+
|  Audit log:                                                     |
|    expected deletions:  1,420,318                               |
|    actual deletions:    1,420,318                               |
|    delta:                       0    [OK]                       |
|    next run:            tomorrow 02:00 IST                      |
|                                                                 |
|  Sample store (regulated, 90d):                                 |
|    expected deletions:    34,217                                |
|    actual deletions:      34,217                                |
|    delta:                       0    [OK]                       |
|                                                                 |
|  Primary records (per-tenant, 7y):                              |
|    expected deletions:     1,082                                |
|    actual deletions:       1,082                                |
|    delta:                       0    [OK]                       |
|                                                                 |
|  Backups (mirrored):                                            |
|    backup of stale records?    NO    [OK]                       |
+-----------------------------------------------------------------+

The dashboard shows that the policy is being executed. Discrepancies (expected vs actual) are alarms — either the deletion ran short, or records were missed.

A regulator's question becomes "show me the dashboard for the last twelve months." The platform can produce it in minutes.


Common mistakes

No retention policy. Data accumulates indefinitely. Storage costs grow; risk grows; regulatory exposure compounds.

Policy exists but no enforcement. The policy is documentation; the actual deletion never runs. Mid-audit discovery.

Backups left out. Production data deleted; backup data not. A regulator's question "is the data gone" gets "yes from production, no from backup" — which is "no" by regulation.

Retention windows too short for business need or too long for regulation. Both kinds of misfit produce pain. The intersection is the right answer.

Cross-jurisdictional uniformity. Treating every tenant by the strictest policy is operationally expensive; treating by the loosest is non-compliant. Per-jurisdiction policy is the discipline.

No deletion proof. Mid-audit, the team scrambles to find evidence; cannot produce. The dashboard is the substrate.


How retention interacts with the other surfaces

  • Classification (chapter 02) — tier drives the window.
  • PII (chapter 05) — PII fields often have shorter windows than non-PII metadata; minimise.
  • Audit (chapter 07) — the audit itself has a retention; not infinite.
  • Right to be forgotten (chapter 09) — explicit erasure cuts through normal retention; the policies coexist.
  • Cross-region (chapter 10) — jurisdiction drives the policy applied.
  • Incident response (chapter 11) — a breach changes the retention story (you may need to preserve evidence beyond normal windows for litigation hold).

Interview Q&A

Q1. The platform retains audit logs for "as long as the application is in production." Why is that a problem? Indefinite retention violates most data-protection regimes for any audit log containing personal data (the audit log usually contains identifiers even when redacted; metadata is not always exempt). It also produces unbounded storage and risk. The right discipline is a window driven by regulation and operational need — typically 1-2 years for general SaaS, longer for regulated sectors. Past the window, deletion. The "indefinite" framing is an undeclared decision that ends up being wrong by default. Wrong-answer notes: "we'll keep it just in case" is the failure mode.

Q2. How do you prove to a regulator that data was deleted on time? The retention dashboard with daily/weekly deletion logs. For each data type, the dashboard shows the expected number of deletions (from records past their window), the actual number, the delta (should be zero), and the timestamp. Twelve months of this is the auditable trail. For high-stakes regulated data, the dashboard's logs may be cryptographically signed so they cannot be altered retroactively. "We have a policy" is documentation; "here is the dashboard" is proof. Wrong-answer notes: ad-hoc spreadsheets are not proof; the dashboard backed by audited automated jobs is.

Q3. A tenant in India has data going to a vector index hosted in Singapore. What jurisdiction applies? The most restrictive of the source's and the destination's. In this case, India's DPDP for the tenant's data, with the cross-border transfer rule applying to its presence in Singapore. The discipline: either ensure the vector index has equivalent protections to the source (contractual; sometimes acceptable under DPDP) or move the vector index to India. The model gateway's privacy zone enforcement (02_ai_infrastructure/01 chapter 10) prevents the transfer at the routing layer if the policy forbids; this chapter ensures the retention applies even after the transfer. Mixed-jurisdiction storage requires explicit policy. Wrong-answer notes: "the destination's jurisdiction is fine" misses the source's continuing obligations.

Q4. The team wants to keep the sample store "forever for eval purposes." What is your response? The sample store contains real production prompts/responses with redacted PII. Even redacted, indirect identifiers and sensitive content (medical questions, financial details, support conversations) persist. Forever retention compounds the risk surface. Right discipline: a window of 90 days to 1 year for the high-fidelity sample; aggregated metrics (token counts, latency, model usage) can be kept longer because they do not contain content. For eval specifically, the eval set itself uses synthetic data (chapter 05's discipline); the sample store is for production traffic eval, not the eval set. Both have their own retention. Wrong-answer notes: "we need it for evals" is the reason for the sample store; "forever" is not the right retention for it.


What to do differently after reading this

  • Build the retention matrix for every data type in the system. Drive each window from regulation, contract, and need (in that order of binding force).
  • Implement automatic deletion. Verify with a daily/weekly dashboard showing expected vs actual.
  • Tag every record with jurisdiction at write time. The deletion job applies per-jurisdiction policy.
  • Include backups in the retention discipline.
  • Be ready to produce twelve months of deletion logs on demand for audits.

Bridge. Retention bounds how long data lives. The audit is what proves the discipline — and is the substrate for incident investigation, leak detection, and right-to-be-forgotten verification. The next chapter is the access audit's specific shape for the agent platform. → 07-access-audit.md