03. Purpose binding¶

With fields classified, the next discipline is the purpose of each access. A query that just says "give me this customer's email" is half a story; the access mediator should know why. Purpose binding turns access from authority into intent.

A security engineer at a Pune fintech investigates an unusual access pattern in the audit log. An employee's credential read 400 customer email addresses in three minutes. The credential is authorised — that employee handles customer support and reads emails routinely. The audit log shows what was read but not why. Was this a legitimate batch operation? An attempt at exfiltration? An automated tool gone wrong? The engineer cannot tell from the audit alone. She walks the floor, finds the employee, learns the access was a legitimate one-off — a marketing campaign needed verified emails for a tier. The legitimacy is clear after the conversation; the absence of purpose in the audit made the investigation cost an hour of work and the employee a moment of unease.

This is the purpose-binding problem. Authority alone — "this credential can read emails" — does not answer the question that matters during investigation: why was this access made? Purpose binding adds the why as a first-class field, declared per call, validated against policy, and recorded in audit.

What purpose binding is¶

Purpose binding requires every access to declare a named purpose that the data layer recognises and validates. Access without a purpose is refused; access whose purpose does not match the data tier or the resource is refused.

The named purpose is a registered identifier in the platform's purpose registry. Each purpose has:

A name (consultation:read_active_patient, support:read_own_orders, analytics:cohort_aggregate)
A description (one paragraph: what is this purpose, who uses it, why does it exist)
An owner (the team or person responsible)
A scope template (what data tiers and resource patterns it authorises)
An audit policy (what to capture beyond the default)
A retention reference (how long access for this purpose is kept)

A purpose is a declared intent, not a free-form string. The agent's tool wrapper or the application's data layer specifies which purpose applies, and the access mediator validates.

Why this matters for agents specifically¶

Three concrete benefits.

1. The audit becomes investigable. With purpose recorded, "400 emails in three minutes" can be queried by purpose: for what reason did this access happen? If the purpose is marketing:campaign_export, the policy may or may not permit it (and the policy enforcement makes the call before the access succeeds). If there is no purpose registered, the access is refused.

2. Multi-purpose credentials become safe. An agent's credential authorises many operations. With purpose binding, the credential is not a wholesale grant; each call narrows itself to one purpose. A credential that can support:read_own_orders cannot suddenly be used for analytics:cohort_aggregate without a purpose declaration, which itself is checked against policy.

3. Cross-team accountability is real. When a purpose's owner is the team that uses it, the audit query "who accessed this data for what reasons" gives a per-team breakdown. Conversations about access patterns become specific.

The purpose registry¶

The registry is a small artefact. A directory of YAML files or a small service.

purposes/
  support_read_own_orders.yaml
    name: support:read_own_orders
    description: |
      A customer-support agent reads orders belonging to the
      currently-active customer (the user the agent is helping).
    owner: support-platform
    allowed_data_tiers: [internal, sensitive]
    resource_scope:
      type: per-user
      key: active_user_id
    audit_policy:
      capture_full_args: true
      retention_days: 365
    rate_limit:
      per_session_per_hour: 200

  consultation_read_active_patient.yaml
    name: consultation:read_active_patient
    description: |
      A clinical-assistant agent reads the medical record of the patient
      currently being consulted. Single active patient per session.
    owner: clinical-platform
    allowed_data_tiers: [internal, sensitive, regulated]
    resource_scope:
      type: per-resource
      key: active_patient_id
    audit_policy:
      capture_full_args: true
      retention_days: 3650           # regulatory
      signed: true
    rate_limit:
      per_session_per_hour: 50

  marketing_campaign_export.yaml
    name: marketing:campaign_export
    description: |
      Bulk export of customer email addresses for a named campaign.
      Requires per-campaign approval; high-blast operation.
    owner: marketing-engineering
    allowed_data_tiers: [sensitive]
    resource_scope:
      type: cohort
      requires_approval: true
      approval_role: marketing-director
    audit_policy:
      capture_full_args: true
      retention_days: 2555
      signed: true
      alert_on_each: true

The registry has the purposes the platform recognises. New purposes are added by PR with review by data-protection and the data owner. Purposes can be deprecated; calls citing a deprecated purpose are refused (chapter 09).

How the call site declares purpose¶

Every read or write specifies the purpose. The wrapper or the data layer's client signature requires it:

# read
order = data_mediator.read(
    resource="orders",
    resource_id=order_id,
    purpose="support:read_own_orders",
    actor=current_user,
    context={"active_user_id": current_user.id},
)

# write
ticket = data_mediator.write(
    resource="support_tickets",
    payload={...},
    purpose="support:create_ticket",
    actor=current_user,
)

The mediator validates the purpose against the registry, then against the call's context — does the resource_id match the active_user_id? Are the requested fields within the allowed_data_tiers? Is the actor authorised to use this purpose?

A call without a purpose is refused. A call with a purpose mismatched to the data is refused.

The validation chain¶

For each call:

1. Look up the purpose. If not in the registry, refuse.
2. Check actor's eligibility for this purpose.
   - Some purposes are agent-only; some are human-only; some are either with caveats.
3. Check resource scope against the purpose's scope rules.
   - per-user: resource_id must equal active_user_id
   - per-resource: resource_id must equal active_<resource>_id
   - cohort: requires_approval; check approval is present
4. Check requested fields against allowed_data_tiers.
   - A purpose with allowed_data_tiers=[internal] cannot return sensitive fields.
5. Check rate limit (per session, per hour) against the bucket.
6. If all pass: execute the access, emit the audit record with purpose recorded.

Refusals at any step produce a structured error (per module 19 chapter 05's discipline):

{
  "ok": false,
  "error": {
    "code": "PURPOSE_VIOLATION",
    "retriable": false,
    "human_hint": "This data is not available for the current operation.",
    "model_action": "Do not retry. The user's request may be outside what this agent is authorised to do.",
    "fields": {
      "purpose": "support:read_own_orders",
      "resource_id_attempted": "ord_99999",
      "active_user_id": "u_42",
      "mismatch": "resource not owned by active user"
    }
  }
}

The error tells the model what to do next; the audit records the violation; the security team's dashboard tracks the rate.

Where the purpose comes from¶

Three possible sources for the purpose on a call.

Code-declared. The tool wrapper or service method has the purpose hard-coded:

def read_active_patient_record(patient_id):
    return data_mediator.read(
        resource="patient_record",
        resource_id=patient_id,
        purpose="consultation:read_active_patient",
        ...
    )

Most agent tools fit this pattern: the tool's purpose is part of its definition. Hard-coded purpose is the default and the safest.

Caller-declared. A general-purpose data API receives the purpose as a parameter:

data_mediator.read(
    resource="orders",
    resource_id=order_id,
    purpose=request.headers["x-purpose"],
    ...
)

This is more flexible but lets callers lie about their purpose. Suitable for internal APIs where callers are trusted services; not for tool wrappers exposed to LLM-driven agents.

Inferred. The mediator derives the purpose from the request context (the route, the caller identity). Useful for transitional designs where existing call sites cannot easily declare a purpose. Should be a temporary measure on the path to code-declared.

For agent-driven calls, code-declared is the right discipline.

What this prevents¶

Three classes of incident that purpose binding catches.

The cross-purpose access. An agent built for support reads marketing data because the credential allows it. The credential is not the boundary; the purpose is. With purpose binding, the access requires marketing:* purpose, which the support agent's tool wrappers do not declare.

The escalating purpose creep. Over months, a feature added to a tool quietly broadens what the tool reads. Without explicit purpose, the broadening is invisible. With purpose, each new read declares its purpose; the purpose's policy is checked at PR time and at runtime.

The investigative blindness from the chapter-opening case. The "400 emails in three minutes" investigation that took an hour. With purpose, the audit answers for what reason; the investigation completes in minutes.

What this does not solve¶

A purpose that lies. If the code declares a purpose that does not match what it is actually doing, the lie is in the code, not in the discipline. Code review and policy validation catch most cases; runtime cannot catch all.
A purpose that is too broad. A purpose general:agent_operation defeats the discipline by approving everything. The registry's review process should refuse vague purposes.
An attacker with valid credentials and a valid purpose. Purpose binding is not a defence against fully-authorised malicious actors; it is a defence against unintentional and over-broad access.

How purpose binding interacts with the other surfaces¶

Classification (chapter 02). A purpose declares which tiers it can access; the data layer enforces.
Per-call scope (chapter 04). Purpose narrows the scope; chapter 04 builds the per-call resolution.
PII (chapter 05). Purposes that allow sensitive or regulated tiers carry PII handling rules.
Retention (chapter 06). Purpose drives the retention window for the audit of that access.
Audit (chapter 07). The purpose is a first-class audit field.
Leak detection (chapter 08). Anomalies are detected per purpose — unusual rate, unusual targets, off-hours access for a purpose normally daytime.

Interview Q&A¶

Q1. Why is "the credential is authorised" not enough? Because credentials are coarse and persistent; each call's purpose is finer and ephemeral. A credential that can read emails does not say why this specific call is reading emails. Purpose binding makes the why explicit, validated, and audited. Without it, the audit answers what happened but not why, and the policy cannot tell whether a sequence of authorised accesses is a legitimate operation or a slow-motion exfiltration. Wrong-answer notes: "OAuth scopes are enough" stops at the credential level; purpose binding extends to the per-call intent.

Q2. Walk through how a new purpose gets added to the registry. A team proposing a new purpose opens a PR with the YAML entry: name, description, owner, allowed_data_tiers, resource_scope, audit_policy. The PR is reviewed by data-protection (for tier and audit appropriateness), the data owner (for scope alignment), and the owning team's tech lead. Approval requires all three. The new purpose is deployed to the registry; the access mediator picks it up. Calls citing the purpose now succeed against the policy; before deployment, calls citing it would have been refused. Wrong-answer notes: "an engineer adds it as needed" lacks the review that prevents purpose proliferation.

Q3. The team objects: "purpose binding is too much paperwork; we'll never get features shipped." How do you respond? Two moves. First, the registry should be small — most agent platforms work with 20-50 purposes, not thousands. A new purpose is added when a genuinely new operation is needed, not per feature. Second, the cost of not having purpose binding is the investigative blindness from the chapter-opening case and the cross-purpose creep that compliance later flags. Three weeks to build the registry and adopt the discipline pays for itself the first incident. The "paperwork" framing collapses on examination — adding one YAML entry per genuine new operation is the discipline, not bureaucracy. Wrong-answer notes: caving to "too much paperwork" produces the chapter-opening incident.

Q4. A purpose's owner has left the company. What happens? The purpose enters a "needs review" state in the registry; calls still succeed but a dashboard surfaces the orphaned purpose to the security or data-protection team. The team identifies a new owner — usually the successor team owning the data, or the platform team. The owner is updated; the purpose's policy is reviewed in case the absence revealed a stale rule. Orphaned purposes are not deleted automatically — the deletion would refuse production calls — but they cannot remain orphaned indefinitely. Wrong-answer notes: "we just leave it" produces accountability drift; "we delete it" produces an outage.

What to do differently after reading this¶

Stand up the purpose registry. Start with 10-20 purposes that cover the existing agent operations.
Update tool wrappers to declare purpose per call.
The access mediator validates purpose, scope, tier, and rate limit before executing.
Refuse calls without a registered purpose. Treat refusals as signals to extend the registry deliberately, not to bypass.
Make purpose a first-class audit field.

Bridge. Purpose says why the access happens. The next layer says how narrow it should be. A purpose support:read_own_orders is a category; the per-call scope is the this user's order this conversation. The next chapter is per-call scope resolution. → 04-per-call-scope-resolution.md