00. Tool integration contracts — First-principles overview¶

Module 01 taught you to give an agent tools. This module teaches you to run those tools as a versioned production API boundary owned by humans, called by a non-deterministic model, and connected to systems that change without asking.

A sales-ops agent at a Bengaluru SaaS company logs leads into Salesforce twenty-four hours a day. On a Tuesday, the CRM team adds a new required field to the Lead object — lead_source — and ships it through their normal Monday release. Nothing in the agent stack changes. The tool contract in the agent's registry still says the same eight fields it always did. The model calls create_lead(...). Salesforce returns 400 Bad Request: REQUIRED_FIELD_MISSING: lead_source. The model reads the error, retries with the same payload, fails again, and moves on. For thirteen days every new lead is silently lost. Discovery happens when a regional sales head complains that pipeline numbers do not match the inbound email count. The root cause is not the model. The model did exactly what the contract advertised. The root cause is that the contract was a copy-paste of a schema from six months ago, no one owned it, no test exercised it against the live CRM, and no monitor watched for a sudden rise in 4xx errors on a write tool.

This is a contract-layer failure. Module 01 taught you that a tool is a typed function the agent can call. This module teaches you to run the contract behind that function the way a backend team runs a public API — schema, versioning, idempotency, error shapes, scopes, drift detection, and audit — because that is exactly what it is.

What a tool contract really is¶

A tool contract is the API boundary between two systems with different change cadences. On one side: an LLM caller that is non-deterministic, sometimes hallucinates argument values, retries on transient errors, and will be reasked the same task by thousands of users. On the other side: a real production system with owners, SLAs, auth boundaries, rate limits, version cycles, and side effects that touch money, identity, or data the company is legally responsible for.

Everything in this module follows from one observation: the model is now your API client, and it is the worst-behaved client you will ever ship to. It does not read changelogs. It does not handle errors gracefully unless you teach it. It will retry write operations if the response is ambiguous. It will pass strings where you expected enums. It will invent fields that look plausible. The contract is what protects the system on the other side from that client.

Module 01 looked at tool contracts from the agent's side — what does the model see, how do I shape the schema so it picks the right tool? This module looks at the same surface from the platform side — how do I run this contract so the underlying system stays correct, owned, and recoverable as both sides evolve?

The six contract surfaces¶

Every production tool integration has exactly six surfaces. Memorise them once. The rest of the module is consequences.

Surface	One-liner	Pressure it answers
The schema	The typed contract: name, parameters, types, descriptions, return shape	language: model and system must agree on what the call means
The class	Read vs write vs irreversible vs human-gated	blast-radius: a wrong call costs different things at different classes
The idempotency	Retry safety: can the same call be made twice without harm?	non-determinism: models retry, networks flap, you cannot assume once
The error contract	Shape of failure the model can read and recover from	recovery: a stack trace is not a contract; structured errors are
The scope	Per-tool credentials, tenant boundaries, allowed targets	isolation: a single agent should not hold every key in the company
The version	How the contract changes, who is notified, how rollouts work	drift: the system on the other side will change without asking you

In an interview or design review, draw these six boxes. Then ask: which surface is under-designed in this tool? Almost every tool-integration incident maps to one box.

The recurring vocabulary¶

These terms appear in every chapter. They are the module's shorthand.

Name	Surface	What it is
the schema	Schema	the typed contract — names, types, enums, required-vs-optional, descriptions
the class	Class	the authority class of a tool: read, write-idempotent, write-non-idempotent, irreversible, human-gated
the idempotency key	Idempotency	a caller-supplied unique token that turns retries into safe no-ops
the dry-run	Validation	a precondition check that returns "would this succeed?" without committing
the structured error	Error	a typed error object the model can branch on — `code`, `retriable`, `human_hint`, `corrective_action`
the scope token	Scope	a credential bound to one tool, one tenant, one purpose — never a god-key
the version tag	Version	the contract version the agent was built against, sent on every call
the dual-run window	Version	the period when v1 and v2 of a tool coexist so callers can migrate
the contract test	Testing	a golden-input test that runs against the real system (or a high-fidelity fake) on every release
the drift monitor	Testing	a watcher that detects when the live system stops matching the contract
the audit log	Observability	per-call record: who called, on whose behalf, with what arguments, with what outcome
the redaction policy	Observability	what fields are stripped from logs before storage

The journey: build the boundary, then operate it¶

This module has two acts. After Act 1 your tool contract is correct. After Act 2 it survives change.

Act 1 — Build the boundary (files 01–07). Each file constructs one surface. By file 07 you have a contract that is schema-correct, class-aware, idempotent, error-shaped, scoped, validated, and ready to ship.

Act 2 — Operate the boundary (files 08–11). Each file adds one operational discipline. The contract does not become more powerful; it becomes survivable across versions, providers, and incidents.

Synthesis (files 12–13). Architect checklist and honest admission of what contracts still cannot solve.

Memory map¶

#	File	Surface	Pressure answered	What it adds
01	tool-as-production-api	—	function-call thinking vs API thinking	reframes every tool as an API boundary
02	contract-anatomy	Schema	language drift between caller and system	name, params, types, descriptions, return shape
03	read-write-irreversible-classes	Class	uniform handling vs proportional governance	the four authority classes that drive everything downstream
04	idempotency-and-retry-safety	Idempotency	retries vs duplicate side effects	idempotency keys, dedup windows, retry semantics
05	error-contracts-the-model-can-recover	Error	stack traces vs structured failure	typed error shapes, retriability flags, model-readable hints
06	scopes-and-credential-isolation	Scope	god-keys vs least privilege	per-tool tokens, tenant boundaries, target allowlists
07	validation-pre-and-post	Schema + Class	trust the model vs verify the call	preconditions, postconditions, dry-run modes
	— milestone: contract is correct —
08	versioning-and-deprecation	Version	velocity vs stability	semver for tools, dual-run windows, sunset comms
09	integration-drift-detection	Version	silent breakage vs early warning	contract pacts, schema drift monitors, 4xx-rate alarms
10	contract-testing	Version + Schema	hope vs evidence	golden inputs, schema fuzz, contract tests in CI
11	observability-and-audit	All	unexplainable incident vs reproducible one	per-call audit, redaction, replay-from-log
	— milestone: contract is operable —
12	architect-checklist	Synthesis	completeness	20-item design / build / launch / operate
13	honest-admission	Boundaries	humility	what contract design still cannot answer

Three traversal paths use this map. Prerequisite path — read top to bottom. Failure path — when an integration incident wakes you, find which surface is under-designed. Synthesis path — pick two rows from different surfaces and ask how they compose (e.g., Idempotency + Version = what happens to in-flight retries when the contract changes mid-window?).

How this module relates to its neighbours¶

01_agentic_system_design — that module's 03-tool-contracts.md taught the schema as part of the toolbelt primitive. This module zooms in on the contract as a production boundary you operate. If you have not read module 01, read its tool-contracts chapter first.
02_durable_agent_workflows — durable workflows depend on idempotent tools. This module is where that idempotency contract is actually designed.
05_ai_incident_operations — most agent incidents you will run are contract incidents. This module is the preventive side; that module is the response side.
../03_ai_security_safety/01_prompt_injection_security — scopes and credential isolation in this module are the load-bearing defence against tool-exfiltration attacks taught there.
../02_ai_infrastructure/01_model_gateway_provider_ops — the gateway operates contracts against model providers; this module operates contracts against every other system the agent touches. Same discipline, different surface.

Top resources¶

OpenAPI Specification 3.1 — https://spec.openapis.org/oas/v3.1.0
JSON Schema 2020-12 — https://json-schema.org/draft/2020-12/release-notes
Stripe API — idempotency keys — https://docs.stripe.com/api/idempotent_requests
Anthropic — tool use overview — https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
OpenAI — function calling guide — https://platform.openai.com/docs/guides/function-calling
Pact — consumer-driven contract testing — https://docs.pact.io/
Google API Design Guide — versioning — https://cloud.google.com/apis/design/versioning
MCP specification — tool schemas — https://modelcontextprotocol.io/specification

What's coming¶

01-tool-as-production-api.md — Reframe: a tool is not a function the model calls; it is a production API boundary that happens to have an LLM as its client.
02-contract-anatomy.md — The six fields of a usable tool contract — and what each one prevents.
03-read-write-irreversible-classes.md — Four authority classes. Every governance decision downstream — approval gates, retries, audit depth, who can deploy — is keyed off this.
04-idempotency-and-retry-safety.md — Idempotency keys, dedup windows, and what retry semantics actually look like when the caller is a non-deterministic model.
05-error-contracts-the-model-can-recover.md — Structured errors the model can branch on, instead of stack traces it parrots back to the user.
06-scopes-and-credential-isolation.md — One scoped token per tool, per tenant. Why a god-key in an agent is the single largest unforced security error.
07-validation-pre-and-post.md — Preconditions, postconditions, and dry-run modes. Verifying before the side effect and after the response.
08-versioning-and-deprecation.md — Semver for tools, dual-run windows, sunset comms — how a contract changes without breaking callers.
09-integration-drift-detection.md — The upstream system will change without telling you. The monitors that catch it.
10-contract-testing.md — Golden inputs, schema fuzzing, contract pacts. Hope is not a strategy; tests are.
11-observability-and-audit.md — What to log on every call, what to redact, how to replay a tool call from logs for incident review.
12-architect-checklist.md — Twenty items: design, build, launch, operate.
13-honest-admission.md — What contracts still cannot defend against — and where the discipline is honestly young.

Bridge. Before we design the contract, we have to dislodge one habit: thinking of a tool as a function the model calls. Tools live behind production APIs, owned by other teams, with their own SLAs and change cadences. The first chapter reframes the whole module around that fact. → 01-tool-as-production-api.md