08. Leak detection¶

The audit captures what happened. Leak detection is what notices when what happened is anomalous — when individually legitimate-looking accesses combine into something that is not. The discipline turns the audit from a static record into a live signal.

A security engineer at a Mumbai consumer-tech company sets up a leak-detection dashboard for the agent platform. Within a week, an alarm fires: an agent identity made 1,400 reads of customer emails in two hours, while the baseline for that agent is around 80 reads per hour. The reads all carried valid purpose and scope; nothing was refused. The investigation finds that a new feature, rolled out two days earlier, made the agent fetch emails on every conversation turn rather than once per session. No leak occurred — the reads were within scope — but the access pattern was unprecedented and worth flagging. The team adjusts the feature to fetch once per session; the access rate returns to baseline.

Leak detection's job is to surface the unusual. Not every signal is a breach; many are. The discipline is to make the signals visible and to make response fast when one of them is real.

What leak detection is¶

Leak detection is the set of monitors that surface anomalous access patterns from the audit log, so a small fraction of attention catches the cases that matter.

Three kinds of signal.

Volume anomalies. An actor reads more than usual, faster than usual, or against more resources than usual.

Pattern anomalies. An access happens at an unusual time, from an unusual location, against an unusual combination of resources, or with an unusual purpose for that actor.

Outcome anomalies. Refusals spike (probing); regulated-tier accesses appear in unexpected contexts; cross-tenant scope checks fail.

Each is a query against the audit (chapter 07). Together they catch most leak-class events early.

The leak signals to monitor¶

A starting set for an agent platform.

Signal	What triggers it	What it might indicate
Access volume per actor	Rate > N × baseline for last hour	New feature rolled out; bug producing extra reads; exfiltration
Resource fan-out per session	A session accessing > N distinct resources	Agent reading more records than the conversation warrants
Cross-resource access	Accesses spanning resources not normally co-touched	Probing; coreference drift
Regulated-tier reads outside normal purpose	Reads of regulated data with a purpose not normally invoking it	Privilege escalation; misclassification
Refusal rate spike	Refusals > N per minute on a specific actor	Probing; prompt issue
Scope-check failures	Scope validation refused > N times	Argument-against-context mismatch; injection attempt
Tenant-boundary touches	Any cross-tenant scope failure	Multi-tenant breach attempt
Off-hours access	Reads outside normal hours for the actor	Compromised credential; automation gone wrong
New-actor activity	First access from an actor identity	New service deployed; or unauthorised credential
Bulk download patterns	Sequential reads of N+ records in a short window	Exfiltration; legitimate batch

Each signal has a threshold (often dynamic, baseline ± 3σ) and a response policy (alarm, page, auto-throttle).

Computing baselines¶

Most signals are anomaly-based. The baseline is computed from historical audit data — typically a rolling 30-day window — and updated on a cadence.

For each signal, the baseline captures:

Median value per actor / per tenant / per purpose
Standard deviation
Time-of-day patterns (some actors are diurnal; others are batch)
Day-of-week patterns

Anomalies are deviations that exceed thresholds (commonly 3σ from the median, with adjustment for time-of-day).

Baseline computation runs daily; the anomaly checker runs continuously against the latest baseline.

Reducing false positives¶

The biggest practical challenge is false positives. A platform that pages on every anomaly produces alarm fatigue; engineers start ignoring real alarms.

Three disciplines reduce false positives.

Per-actor baselines. Different agents have different access patterns. A batch agent reads 10,000 records per hour normally; a chat agent reads 200. Aggregating across agents produces alarms that are not meaningful per agent.

Time-of-day calibration. Many platforms have predictable diurnal patterns. The baseline accounts for "12:00 IST" being different from "03:00 IST."

Composite signals. A high read volume alone is often a feature change. A high read volume combined with new-resource fan-out combined with an off-hours timestamp is more likely a real concern. Composite signals (multiple conditions) reduce false positives at the cost of missing single-signal cases — usually the right trade for production alarming.

What the team does when an alarm fires¶

A reasonable response flow:

Triage. Look at the audit for the alarming actor in the time window. What was accessed? Was it within scope (all ok: true) or refused (any ok: false)? Are the accessed resources within the actor's normal pattern?
Correlate. Cross-reference with deploys, feature flags, known maintenance, customer-impact reports. Many alarms have benign causes that surface immediately.
Classify. Three buckets: benign (feature change, batch operation, known cause), suspicious (no benign cause; needs investigation), confirmed (clear breach signal).
Respond.
Benign: silence the alarm for this actor for a defined window; update the baseline if the new pattern is the new normal.
Suspicious: open an investigation; potentially throttle the actor while investigating.
Confirmed: chapter 11 incident response.
Tune. If the alarm fired benignly, document why and adjust the baseline or the threshold. Repeated false positives reduce trust in the system.

The off-line review¶

In addition to live alarms, a periodic offline review catches patterns the live alarms miss. Quarterly, a security analyst (or a security-engineering team) runs broader audit queries:

All accesses to regulated-tier data, grouped by actor and purpose; review for any pattern that looks off.
All cross-tenant scope failures (should be zero); investigate any non-zero.
Top actors by access volume; review for unexpected entries.
New purposes added since the last review; verify their usage matches the registered scope.

Offline review is the catch for the long tail that live alarms cannot tune for without producing false positives.

Detecting exfiltration patterns¶

Specific patterns associated with deliberate data exfiltration through agent platforms:

Slow-and-low. An attacker reads a small number of records per hour, well below any volume threshold, but over weeks accumulates substantial data. Detection: aggregate volumes over longer windows (week, month) per actor; check for accumulation against baseline.

Lateral movement. An attacker compromises one credential and then attempts other purposes or other tenants. Detection: any cross-purpose access by an actor that previously used only one purpose is a signal.

Targeted exfiltration. A specific high-value target (e.g., a high-net-worth customer's data) is accessed by multiple actors over time. Detection: per-resource access patterns; resources accessed by unusual numbers of actors over a window.

Synthesised payloads. An attacker uses the agent's response capability to summarise data into a single response that is then exfiltrated through the user-facing channel. Detection: response-size anomalies; volume of regulated-tier-derived content in responses.

These are advanced patterns; a starting platform may not have detection for all of them. The discipline grows over time.

What leak detection does not solve¶

A leak inside legitimate access. If an attacker has valid credentials and a valid purpose, their accesses look normal; detection has limited leverage. The defence is purpose binding (chapter 03) + scope (chapter 04), narrowing what valid means.
Out-of-band exfiltration. If data leaves through a non-agent channel (an engineer with database access copies to a personal account), this discipline does not apply; general data-access governance does.
Zero-day novel patterns. New attack patterns can evade the signals until they are codified.

The discipline catches most leaks of most shapes. The honest framing is "raises the floor," not "eliminates leaks."

How leak detection interacts with the other surfaces¶

Audit (chapter 07) — the substrate; detection reads from it.
Purpose (chapter 03) — anomalies are often "this purpose has not been seen from this actor before."
Scope (chapter 04) — scope failures are a first-class signal.
Classification (chapter 02) — regulated-tier access patterns get tighter signal thresholds.
Right to be forgotten (chapter 09) — erasure operations are themselves audited and monitored.
Incident response (chapter 11) — detection feeds containment.

Interview Q&A¶

Q1. What is the single highest-leverage leak-detection signal to wire first? Per-actor access volume against a per-actor baseline. Cheap to compute; catches the largest class of "something unusual is happening" events. New features, bugs, exfiltration, automation gone wrong — all produce volume anomalies. The composite signals come later; per-actor volume is the first one. Combined with refusal-rate spikes (also cheap), the first two signals catch a substantial fraction of leak-class incidents. Wrong-answer notes: "watch everything" is unrealistic for a first build; volume + refusal rate is the right starting pair.

Q2. The volume-anomaly alarm fires; the access turns out to be a benign feature rollout. What do you do? Triage: confirm it is benign by cross-referencing the deploy timeline and the feature flags. Update the baseline to reflect the new normal — this is the new steady state for the actor. Tune the threshold if needed; perhaps the previous threshold was tight given the actor's role. Document the alarm and the resolution so the next instance is faster. Repeated benign alarms erode trust; tuning is the discipline. Wrong-answer notes: "ignore the alarm next time" produces alarm fatigue and silently lowered guard.

Q3. An attacker is exfiltrating slowly — 10 records per hour, well below volume thresholds. How does the platform catch this? Longer-window aggregation. The hourly threshold misses; the weekly aggregate (10 × 24 × 7 = 1,680 records) starts to be notable if the actor's normal baseline is lower. Per-actor over time, with cumulative-deviation tracking, catches slow-and-low. The offline review is the secondary catch — quarterly looks at top accessors of regulated-tier data may surface what live alarms missed. The discipline is not single-window monitoring; it is windowed at multiple horizons. Wrong-answer notes: "we'd notice in the audit" without a structured signal misses the systematic detection.

Q4. The team has a high false-positive rate from the alarm system. What do you do? Three moves. One: switch to per-actor baselines if currently aggregated. Two: introduce composite signals (volume + new-resource fan-out + off-hours) so single-cause benign changes do not fire. Three: shorten the noisiest alarms' silence window for known-feature-rollout periods, with explicit re-enable. The goal is to maintain trust — every alarm should require a real triage. Five quiet days a week is better than fifty fires that are mostly noise. Wrong-answer notes: "raise the threshold" without diagnosing whether the threshold is the issue is a blunt instrument.

What to do differently after reading this¶

Wire per-actor volume and refusal-rate signals first. Cheapest, highest-leverage.
Compute baselines per actor with time-of-day calibration. Aggregated baselines produce false positives.
Use composite signals to reduce false positives where single signals are noisy.
Run an offline quarterly review for patterns live alarms miss.
Tune alarms based on benign-vs-real outcomes; alarm fatigue is the failure mode.

Bridge. Leak detection notices when something is happening. The next discipline is the explicit right of a data subject to cause something to happen: to request that their data be erased. The next chapter is right-to-be-forgotten, the workflow that touches live data, audit, backups, and embeddings. → 09-right-to-be-forgotten.md