00. Tool integration contracts — First-principles overview¶
Module 01 taught you to give an agent tools. This module teaches you to run those tools as a versioned production API boundary owned by humans, called by a non-deterministic model, and connected to systems that change without asking.
A sales-ops agent at a Bengaluru SaaS company logs leads into Salesforce twenty-four hours a day. On a Tuesday, the CRM team adds a new required field to the Lead object — lead_source — and ships it through their normal Monday release. Nothing in the agent stack changes. The tool contract in the agent's registry still says the same eight fields it always did. The model calls create_lead(...). Salesforce returns 400 Bad Request: REQUIRED_FIELD_MISSING: lead_source. The model reads the error, retries with the same payload, fails again, and moves on. For thirteen days every new lead is silently lost. Discovery happens when a regional sales head complains that pipeline numbers do not match the inbound email count. The root cause is not the model. The model did exactly what the contract advertised. The root cause is that the contract was a copy-paste of a schema from six months ago, no one owned it, no test exercised it against the live CRM, and no monitor watched for a sudden rise in 4xx errors on a write tool.
This is a contract-layer failure. Module 01 taught you that a tool is a typed function the agent can call. This module teaches you to run the contract behind that function the way a backend team runs a public API — schema, versioning, idempotency, error shapes, scopes, drift detection, and audit — because that is exactly what it is.
What a tool contract really is¶
A tool contract is the API boundary between two systems with different change cadences. On one side: an LLM caller that is non-deterministic, sometimes hallucinates argument values, retries on transient errors, and will be reasked the same task by thousands of users. On the other side: a real production system with owners, SLAs, auth boundaries, rate limits, version cycles, and side effects that touch money, identity, or data the company is legally responsible for.
Everything in this module follows from one observation: the model is now your API client, and it is the worst-behaved client you will ever ship to. It does not read changelogs. It does not handle errors gracefully unless you teach it. It will retry write operations if the response is ambiguous. It will pass strings where you expected enums. It will invent fields that look plausible. The contract is what protects the system on the other side from that client.
Module 01 looked at tool contracts from the agent's side — what does the model see, how do I shape the schema so it picks the right tool? This module looks at the same surface from the platform side — how do I run this contract so the underlying system stays correct, owned, and recoverable as both sides evolve?
The six contract surfaces¶
Every production tool integration has exactly six surfaces. Memorise them once. The rest of the module is consequences.
| Surface | One-liner | Pressure it answers |
|---|---|---|
| The schema | The typed contract: name, parameters, types, descriptions, return shape | language: model and system must agree on what the call means |
| The class | Read vs write vs irreversible vs human-gated | blast-radius: a wrong call costs different things at different classes |
| The idempotency | Retry safety: can the same call be made twice without harm? | non-determinism: models retry, networks flap, you cannot assume once |
| The error contract | Shape of failure the model can read and recover from | recovery: a stack trace is not a contract; structured errors are |
| The scope | Per-tool credentials, tenant boundaries, allowed targets | isolation: a single agent should not hold every key in the company |
| The version | How the contract changes, who is notified, how rollouts work | drift: the system on the other side will change without asking you |
In an interview or design review, draw these six boxes. Then ask: which surface is under-designed in this tool? Almost every tool-integration incident maps to one box.
The recurring vocabulary¶
These terms appear in every chapter. They are the module's shorthand.
| Name | Surface | What it is |
|---|---|---|
| the schema | Schema | the typed contract — names, types, enums, required-vs-optional, descriptions |
| the class | Class | the authority class of a tool: read, write-idempotent, write-non-idempotent, irreversible, human-gated |
| the idempotency key | Idempotency | a caller-supplied unique token that turns retries into safe no-ops |
| the dry-run | Validation | a precondition check that returns "would this succeed?" without committing |
| the structured error | Error | a typed error object the model can branch on — code, retriable, human_hint, corrective_action |
| the scope token | Scope | a credential bound to one tool, one tenant, one purpose — never a god-key |
| the version tag | Version | the contract version the agent was built against, sent on every call |
| the dual-run window | Version | the period when v1 and v2 of a tool coexist so callers can migrate |
| the contract test | Testing | a golden-input test that runs against the real system (or a high-fidelity fake) on every release |
| the drift monitor | Testing | a watcher that detects when the live system stops matching the contract |
| the audit log | Observability | per-call record: who called, on whose behalf, with what arguments, with what outcome |
| the redaction policy | Observability | what fields are stripped from logs before storage |
The journey: build the boundary, then operate it¶
This module has two acts. After Act 1 your tool contract is correct. After Act 2 it survives change.
Act 1 — Build the boundary (files 01–07). Each file constructs one surface. By file 07 you have a contract that is schema-correct, class-aware, idempotent, error-shaped, scoped, validated, and ready to ship.
Act 2 — Operate the boundary (files 08–11). Each file adds one operational discipline. The contract does not become more powerful; it becomes survivable across versions, providers, and incidents.
Synthesis (files 12–13). Architect checklist and honest admission of what contracts still cannot solve.
Memory map¶
| # | File | Surface | Pressure answered | What it adds |
|---|---|---|---|---|
| 01 | tool-as-production-api | — | function-call thinking vs API thinking | reframes every tool as an API boundary |
| 02 | contract-anatomy | Schema | language drift between caller and system | name, params, types, descriptions, return shape |
| 03 | read-write-irreversible-classes | Class | uniform handling vs proportional governance | the four authority classes that drive everything downstream |
| 04 | idempotency-and-retry-safety | Idempotency | retries vs duplicate side effects | idempotency keys, dedup windows, retry semantics |
| 05 | error-contracts-the-model-can-recover | Error | stack traces vs structured failure | typed error shapes, retriability flags, model-readable hints |
| 06 | scopes-and-credential-isolation | Scope | god-keys vs least privilege | per-tool tokens, tenant boundaries, target allowlists |
| 07 | validation-pre-and-post | Schema + Class | trust the model vs verify the call | preconditions, postconditions, dry-run modes |
| — milestone: contract is correct — | ||||
| 08 | versioning-and-deprecation | Version | velocity vs stability | semver for tools, dual-run windows, sunset comms |
| 09 | integration-drift-detection | Version | silent breakage vs early warning | contract pacts, schema drift monitors, 4xx-rate alarms |
| 10 | contract-testing | Version + Schema | hope vs evidence | golden inputs, schema fuzz, contract tests in CI |
| 11 | observability-and-audit | All | unexplainable incident vs reproducible one | per-call audit, redaction, replay-from-log |
| — milestone: contract is operable — | ||||
| 12 | architect-checklist | Synthesis | completeness | 20-item design / build / launch / operate |
| 13 | honest-admission | Boundaries | humility | what contract design still cannot answer |
Three traversal paths use this map. Prerequisite path — read top to bottom. Failure path — when an integration incident wakes you, find which surface is under-designed. Synthesis path — pick two rows from different surfaces and ask how they compose (e.g., Idempotency + Version = what happens to in-flight retries when the contract changes mid-window?).
How this module relates to its neighbours¶
01_agentic_system_design— that module's03-tool-contracts.mdtaught the schema as part of the toolbelt primitive. This module zooms in on the contract as a production boundary you operate. If you have not read module 01, read its tool-contracts chapter first.02_durable_agent_workflows— durable workflows depend on idempotent tools. This module is where that idempotency contract is actually designed.05_ai_incident_operations— most agent incidents you will run are contract incidents. This module is the preventive side; that module is the response side.../03_ai_security_safety/01_prompt_injection_security— scopes and credential isolation in this module are the load-bearing defence against tool-exfiltration attacks taught there.../02_ai_infrastructure/01_model_gateway_provider_ops— the gateway operates contracts against model providers; this module operates contracts against every other system the agent touches. Same discipline, different surface.
Top resources¶
- OpenAPI Specification 3.1 — https://spec.openapis.org/oas/v3.1.0
- JSON Schema 2020-12 — https://json-schema.org/draft/2020-12/release-notes
- Stripe API — idempotency keys — https://docs.stripe.com/api/idempotent_requests
- Anthropic — tool use overview — https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
- OpenAI — function calling guide — https://platform.openai.com/docs/guides/function-calling
- Pact — consumer-driven contract testing — https://docs.pact.io/
- Google API Design Guide — versioning — https://cloud.google.com/apis/design/versioning
- MCP specification — tool schemas — https://modelcontextprotocol.io/specification
What's coming¶
- 01-tool-as-production-api.md — Reframe: a tool is not a function the model calls; it is a production API boundary that happens to have an LLM as its client.
- 02-contract-anatomy.md — The six fields of a usable tool contract — and what each one prevents.
- 03-read-write-irreversible-classes.md — Four authority classes. Every governance decision downstream — approval gates, retries, audit depth, who can deploy — is keyed off this.
- 04-idempotency-and-retry-safety.md — Idempotency keys, dedup windows, and what retry semantics actually look like when the caller is a non-deterministic model.
- 05-error-contracts-the-model-can-recover.md — Structured errors the model can branch on, instead of stack traces it parrots back to the user.
- 06-scopes-and-credential-isolation.md — One scoped token per tool, per tenant. Why a god-key in an agent is the single largest unforced security error.
- 07-validation-pre-and-post.md — Preconditions, postconditions, and dry-run modes. Verifying before the side effect and after the response.
- 08-versioning-and-deprecation.md — Semver for tools, dual-run windows, sunset comms — how a contract changes without breaking callers.
- 09-integration-drift-detection.md — The upstream system will change without telling you. The monitors that catch it.
- 10-contract-testing.md — Golden inputs, schema fuzzing, contract pacts. Hope is not a strategy; tests are.
- 11-observability-and-audit.md — What to log on every call, what to redact, how to replay a tool call from logs for incident review.
- 12-architect-checklist.md — Twenty items: design, build, launch, operate.
- 13-honest-admission.md — What contracts still cannot defend against — and where the discipline is honestly young.
Bridge. Before we design the contract, we have to dislodge one habit: thinking of a tool as a function the model calls. Tools live behind production APIs, owned by other teams, with their own SLAs and change cadences. The first chapter reframes the whole module around that fact. → 01-tool-as-production-api.md