07. The access audit¶
Retention bounds how long data lives. The access audit is the per-call record that lets every claim in this module be verified, every incident be investigated, and every regulator's question be answered. It is the substrate for the rest of the discipline.
A security engineer at a Hyderabad finance company gets a call from compliance: a customer has reported that their data may have been mishandled, and the regulator has asked for a complete log of every access to that customer's data over the past 90 days. The engineer queries the access audit. The query returns 412 events across 14 services and 6 agent operations, with purpose, actor, scope, fields-read, and outcome on every row. The query completes in 90 seconds. The export goes to the compliance team in the morning; the regulator's deadline is met two days early. Without the audit at this fidelity, the same investigation would have taken weeks and produced a partial answer.
The access audit is what makes "who accessed what" answerable. Every chapter above this one relies on the audit existing and being queryable.
What the access audit captures¶
A per-call record. Every read, every write, every retrieval emits one. Fields:
audit_id: aud_01HNF... # unique identifier
ts_started: 2026-05-25T11:14:02.371Z
ts_completed: 2026-05-25T11:14:02.412Z
duration_ms: 41
actor:
type: agent # or "user", "service", "batch"
identity: agent.support.v3
acting_on_behalf_of:
user_id: u_42
session_id: sess_01HNF...
tenant_id: acme-corp
jurisdiction: in
purpose: support:read_own_orders # chapter 03
scope: # chapter 04
customer_id: u_42
resource_constraint: per-user
operation:
type: read
resource: orders
resource_id: ord_99
fields_requested: [order_id, items, total_minor, status]
fields_tier_summary: # chapter 02
internal: [order_id, status]
sensitive: [items, total_minor]
regulated: []
result:
ok: true
rows_returned: 1
bytes_returned: 482
scope_check: # chapter 04
enforced_by: row-level-security
passed: true
audit_id_parent: trc_01HNF... # trace correlation
audit_id_caller: aud_01HNF... # if this was triggered by another audit
What this captures, conceptually:
- Who — actor identity, acting-on-behalf-of, tenant, jurisdiction
- What — the operation, the resource, the fields, with tier summary
- Why — the purpose
- Where — the scope and its enforcement
- When — timestamps
- How — duration, bytes returned
- Outcome — success or refusal
The audit is structured (JSON or similar), not free-text. Every field is queryable.
What the audit does not capture¶
The audit captures metadata, not the data itself. The fields returned by the read are not in the per-call audit — they are in the sample store (chapter 05) for a small fraction of calls, or in the application's own data plane.
The reason: an audit log that contains the full data multiplies the surface area of every sensitive field by every access. The audit should be queryable for who/what/why/when; the actual data lives in its proper store with its proper protection.
Two exceptions where the data does enter audit:
- Refusals. When a call is refused, the attempted arguments are captured (with redaction), so investigations can see what the agent tried to do.
- Write operations. The payload of writes is sometimes captured (redacted) because the write itself is the audit-relevant event; "what was written" is the question.
For reads, the audit knows fields requested and counts returned; the values are not in the audit.
Refusal audit¶
Refusals are first-class events. A scope violation, a purpose violation, a rate-limit refusal, a credential refusal — each produces an audit record with the same fields, with result.ok = false and the violation type captured.
result:
ok: false
refusal_code: SCOPE_VIOLATION
refusal_detail: resource_id mismatch with scope
attempted_resource: ord_99999
expected_scope: { customer_id: u_42 }
Refusals are the most interesting events for security review. A high refusal rate from one actor is a leading indicator of probing, prompt issues, or scope mismatches the platform team should investigate.
How the audit is queried¶
The audit is built for fast queries on the dimensions investigations care about. Typical queries:
- Show me every access to customer X in the last 90 days. —
WHERE acting_on_behalf_of.user_id = 'X' AND ts >= now() - 90d - Show me refusals by purpose, last 24 hours. —
WHERE result.ok = false AND ts >= now() - 24h GROUP BY result.refusal_code - Show me every cross-tenant access attempt. —
WHERE scope_check.passed = false AND refusal_code = 'TENANT_MISMATCH' - Show me agents reading regulated-tier fields. —
WHERE operation.fields_tier_summary.regulated IS NOT EMPTY - Show me write operations in the last hour by tenant. —
WHERE operation.type = 'write' AND ts >= now() - 1h GROUP BY tenant_id
The audit's storage is chosen so these queries are fast. Time-series stores, OLAP databases, append-only logs with indices — implementation varies; the property is fast time-bounded scoped queries.
Append-only and tamper-evident¶
The audit is append-only. Records are never modified after creation. Deletion happens only via the retention boundary (chapter 06).
For high-stakes data (regulated tier), the audit is tamper-evident: each record's hash chains to the previous, so a retroactive modification breaks the chain. Some platforms additionally sign records or use append-only logs with cryptographic merkle trees.
The tamper-evidence is for regulatory and forensic purposes — if a breach is investigated months later, the audit must be defensibly intact. Tamper-evidence does not prevent compromise; it makes compromise detectable.
What separates the audit from application logs¶
A common confusion: the audit and the application's debug log do similar things.
| Property | Application logs | Access audit |
|---|---|---|
| Audience | Engineers debugging | Compliance, security, investigation |
| Retention | 30-90 days | 1-10y depending on tier |
| Schema | Free-text, semi-structured | Strictly structured |
| Mutability | Often rotated, sometimes redacted retroactively | Append-only |
| Access | Many engineers | Few, with audit |
| Contents | Whatever the application logged | Per-call governance metadata |
| Reads | Engineer's query | Compliance query |
The two coexist. They serve different purposes. Confusing them — putting access events in the application log — produces audits that cannot answer compliance questions and application logs that hold too much sensitive data.
Building the audit, in practice¶
A reasonable implementation:
- Emission. Every data-access library call (the mediator from chapter 04) emits an audit record asynchronously to a queue.
- Pipeline. A worker consumes the queue and writes to the audit store. Atomicity: queue durability + at-least-once delivery + dedup on
audit_id. - Storage. The audit store is separate from the application database. A managed log service or a dedicated cluster.
- Indexes. Time, tenant, actor, purpose, refusal_code, regulated-tier-touched.
- Access. Read-only for engineers; audited reads (queries against the audit are themselves logged).
- Retention. Per chapter 06 — automatic deletion at the boundary.
The async emission is critical: synchronous audit emission can add latency to every call. The cost of an audit miss (queue failure) is bounded by the dedup window and tolerable for most workloads.
Common mistakes¶
Capturing too little. Audit fields that look obvious are often missing — the actor, the purpose, the scope. Without these, the audit cannot answer compliance questions.
Capturing too much. Audit that includes full request/response bodies is large, slow to query, and a privacy concern. Metadata in the audit; content in the sample store.
Conflating audit and application logs. Same destination, same access pattern. The audit needs its own storage with its own access controls.
Synchronous audit on the hot path. Latency cost; failure mode. Async with durable queues is the right pattern.
No tamper evidence for regulated data. Audit can be retroactively modified by anyone with write access. For regulated data, append-only with hash chaining or merkle trees is the discipline.
Audit queries are not themselves audited. A breach investigation that involves an audit query should produce its own audit. Otherwise, who-investigated-when is invisible.
How audit interacts with the other surfaces¶
- Every other chapter writes to the audit and reads from it. Audit is the substrate.
- Classification (chapter 02) — fields_tier_summary uses the labels.
- Purpose (chapter 03) — purpose is a first-class audit field.
- Scope (chapter 04) — scope and scope_check.passed are audited.
- PII (chapter 05) — the redactor applies to the audit at write.
- Retention (chapter 06) — the audit has its own retention.
- Leak detection (chapter 08) — reads from the audit.
- Right to be forgotten (chapter 09) — erasure operations are audited, and audit may be subject to erasure.
- Incident response (chapter 11) — investigations begin in the audit.
Interview Q&A¶
Q1. Walk through the fields an access audit record must capture. Identity: audit_id, timestamps, duration. Actor: identity, type, acting-on-behalf-of, tenant, jurisdiction. Purpose: the registered purpose name. Scope: the resolved per-call scope and how it was enforced. Operation: type (read/write), resource, resource_id, fields_requested, tier_summary. Result: ok or refusal with refusal_code. Correlation: trace_id, caller_audit_id. The fields together answer who, what, why, where, when, how, and outcome — the questions a compliance investigation always asks. Wrong-answer notes: missing purpose or scope is the most common gap.
Q2. Why is content not in the per-call audit? Storage and risk. An audit including full content of every read multiplies the privacy surface across millions of records. The audit's job is "who accessed what" (metadata); the content's job is "what was the actual data" (the source of truth in the data plane, with appropriate retention). For investigation needs, a sample of full content lives in a separate store with stricter access. Skipping content in the per-call audit is not a compromise; it is the correct boundary. Wrong-answer notes: "capture everything for completeness" produces the chapter-5 leak surface.
Q3. The compliance team asks for every access to a specific customer in the last 90 days. How long does this take?
Minutes if the audit is well-indexed on acting_on_behalf_of.user_id and ts. The query returns the structured records; the team formats them per the regulator's required form; the result goes to compliance. Without the audit at this fidelity, the answer requires cross-referencing multiple system logs, often with partial coverage, taking days or weeks. The audit is the substrate that makes this fast. Wrong-answer notes: "we'd build a query" without considering whether the audit captures the right fields produces a slow answer.
Q4. Why must the audit be append-only? Because mutability defeats every other property the audit provides. A modifiable audit can have records altered retroactively to hide an incident; backups can be selectively rewritten; investigations cannot trust historical records. Append-only with tamper-evidence (for regulated data) provides cryptographic certainty that records have not been altered. The audit's value is its trustworthiness over time; mutability undermines exactly that. Wrong-answer notes: "we trust our engineers" is the argument that produces the next breach.
What to do differently after reading this¶
- Define the audit's structured schema. The fields are the answers to compliance questions; design them with those questions in mind.
- Emit asynchronously through a durable queue. Synchronous audit on the hot path is operational pain.
- Index on actor, tenant, purpose, time, refusal_code. Those are the query dimensions.
- For regulated data, add tamper-evidence (hash chaining or signed records).
- Audit the queries against the audit. Investigations need their own footprint.
Bridge. The audit captures what happened. The next discipline is detecting when what happened is anomalous — when a sequence of legitimate-looking accesses actually represents a leak. The next chapter is leak detection. → 08-leak-detection.md