01. A tool is not a function. It is a production API boundary.¶
Module 01 framed a tool as a typed thing the model can call. That framing was correct for the agent side. It is wrong for the integration side. Until you flip the frame, the rest of this module will read like checklist hygiene instead of system design.
A payments engineer at a Mumbai logistics startup spends a Thursday wiring a refund tool into the support agent. The wrapper is forty lines of Python: take a payment ID and amount, validate basic shape, call the internal payments-svc REST endpoint with a service-account token, return the response. It works in dev. It works in staging. It ships on Friday at 16:00 IST. By Friday 21:00 the on-call channel is on fire: the agent has issued forty-seven refunds against payment IDs that do not exist in production. The reason is mundane. payments-svc had migrated to a new ID format two months earlier, and the staging environment was still running the old service. The agent's tool wrapper was authored against staging. The contract — the only artefact that recorded what shape payment_id actually had — was a docstring on a Python function, and it was six months stale.
This is not a model failure. It is not even, strictly speaking, an agent failure. It is the same failure that has happened in microservices, mobile clients, and partner integrations for thirty years: a client was built against an outdated contract, the contract was never versioned, no one detected the drift, and a production system absorbed the consequences. The new thing is the client. The old thing is everything else.
The single most important reframe in this module: a tool is a production API boundary between two systems with different change cadences, and the new client on that boundary is a non-deterministic LLM. Once you see it that way, every chapter of this module is recognisable engineering. If you keep seeing tools as "functions the model can call," you will keep making mistakes that the API-design community solved decades ago.
Life without the reframe¶
Tool wiring written under function-call thinking looks like this:
def create_lead(name: str, email: str, company: str) -> dict:
"""Create a new Salesforce lead."""
return sf.Lead.create({"Name": name, "Email": email, "Company": company})
The wrapper does its job. The model can call it. In a notebook, this is fine. In production, it carries every defect that "just a function" framing produces:
- No contract version. When Salesforce adds a required field next quarter, this function silently 400s and nothing in the agent platform knows.
- No idempotency. If the model retries on a timeout, two leads are created. There is no key for the receiving side to deduplicate.
- No structured error. The model receives whatever string
sf.Lead.createraises. If it issalesforce.exceptions.SalesforceMalformedRequest: REQUIRED_FIELD_MISSING: lead_source, the model parrots it back to a customer. - No scope boundary. The
sfclient is a process-wide singleton bound to a service account that can write any object in the org. If the agent is convinced via prompt injection to callupdate_user(...)for the CEO, it can. - No audit. When the postmortem asks "which call did this?", there is one log line:
create_lead called. No trace ID, no caller identity, no payload, no outcome. - No drift signal. If
payments-svcmigrates from integer IDs to UUIDs, this code runs successfully until it hits a real ID and then fails opaquely.
Every one of these is a contract surface the function-as-function framing does not even ask about. The damage is not in any single one. The damage is that the production team owning Salesforce, payments-svc, and the org's audit log has no client they recognise on the other side of these calls. They cannot version it, throttle it, scope it, page it, or roll it back, because it does not present like an API client. It presents like an opaque process making calls.
The reframe¶
Pin this sentence somewhere you read it weekly:
A tool is a versioned API endpoint, scoped to a purpose, called by a non-deterministic client, audited per invocation, and operated by humans across deploys.
Read it left to right.
- Versioned API endpoint — the tool has a schema, the schema has a version, and the version is sent on every call. The system on the other side can refuse calls that are too old.
- Scoped to a purpose — the credential the tool carries is bound to one capability (create_lead, not "Salesforce admin") and one tenant (acme-corp, not "all of Salesforce").
- Called by a non-deterministic client — the model will sometimes invent arguments, retry on ambiguous responses, and abandon midway. The contract has to assume this.
- Audited per invocation — every call produces a record: who, on whose behalf, with what input, with what outcome, at what version.
- Operated by humans across deploys — the contract is owned by a human team. There is a runbook. There is a rollback. There is a deprecation policy.
That sentence is the minimum shape of every tool that touches a real system. A tool that does less than this is debt — not necessarily wrong, but explicitly carrying risk that has to be tracked.
Why the LLM changes the boundary problem¶
API design has been a discipline since the 1990s. Why does this module exist at all? Because the client on the other side of the contract is meaningfully different from anything the discipline has had to design for before, in three specific ways.
The client cannot read your changelog. A normal API client is a human-authored piece of code. When you deprecate a field, you email the team, they update their client, they ship. An LLM client never reads your changelog. The only changelog it sees is the tool description you pass into context, and even that it sees only at call time. Deprecation has to be communicated through the contract itself — at the schema level, with version tags, with structured warnings the model can be trained or prompted to act on.
The client retries semantically. A normal client retries on 500 and 429. An LLM client, when given an ambiguous response, will retry by reasoning about it: "I got an error mentioning 'invalid format', let me try a different format." That is a retry your contract did not authorise. If your endpoint is non-idempotent, you can receive two write calls with semantically different but functionally equivalent payloads. Your idempotency design has to be stricter than usual.
The client's argument space is unbounded. A human-coded client passes the arguments the code computes. An LLM client passes whatever the model decided to pass. You will see field values you did not anticipate: strings where you wanted enums, plausible-looking but fictitious IDs, timestamps in three different formats, amounts in major units when you specified minor units. Validation cannot trust the client.
Together, these three facts mean the defensive responsibility shifts to the contract. You cannot rely on a well-behaved client to handle versioning, deduplication, or input shape. The contract has to enforce all of it, because the client provably will not.
The boundary, drawn¶
Here is the picture every chapter of this module fills in. Hold this in your head.
+---------------------------+
| Agent runtime |
| (LLM + orchestration) |
+-------------+-------------+
|
| the call carries:
| - tool name
| - arguments (untrusted)
| - idempotency key
| - scoped credential
| - contract version tag
| - trace + caller identity
v
+---------------------------+
| The contract (this module)|
| - schema validation |
| - class-based gating |
| - dedup window check |
| - scope enforcement |
| - version negotiation |
| - audit emit |
+-------------+-------------+
|
| the response carries:
| - structured outcome
| - structured error (if any)
| - corrective hints for the model
| - audit ID for replay
v
+---------------------------+
| The downstream system |
| (Salesforce, payments, |
| internal services, ...) |
+---------------------------+
Everything in this module is something that lives in the middle box. Everything in module 01 — the schema the model sees, the loop, the planner — lives in the top box. Everything that owners of the bottom box already know about versioning, scoping, and operating APIs is the prior art this module imports.
The minimum example, rebuilt¶
The forty-line refund wrapper from the opening scene, rewritten with the contract surfaces present (this is illustrative — chapters 02–07 build each surface properly):
@tool_contract(
name="issue_refund",
version="2.1.0",
class_="write-non-idempotent", # → chapter 03 will refine this
scope="payments:refund:write", # → chapter 06
schema=RefundV2_1Schema, # → chapter 02
error_contract=RefundErrors, # → chapter 05
)
def issue_refund(call: ToolCall) -> ToolResult:
# contract layer (every line is a chapter):
args = RefundV2_1Schema.validate(call.arguments) # ch 02, 07
enforce_class_gates(call, class_="write-non-idempotent") # ch 03
if dedup_cache.seen(call.idempotency_key): # ch 04
return dedup_cache.replay(call.idempotency_key)
creds = scope_resolver(call.scope, call.tenant_id) # ch 06
pact.check_contract_version(call.version_tag) # ch 08, 09
# the actual side effect:
try:
result = payments_client(creds).refund(
payment_id=args.payment_id,
amount=args.amount_minor,
idempotency_key=call.idempotency_key,
)
except PaymentsError as e:
return RefundErrors.from_payments(e) # ch 05
# post-side-effect:
audit_log.emit(call, result) # ch 11
dedup_cache.record(call.idempotency_key, result) # ch 04
return ToolResult.ok(result)
Every line maps to a later chapter. The reason the wrapper grew from forty lines to roughly forty lines plus a stack of decorators is not over-engineering. It is that the original wrapper was carrying every one of those responsibilities implicitly, and "implicitly" is the failure mode. Either the contract holds these explicitly, or the next on-call gets paged when one of them silently doesn't.
How to recognise the reframe missing in real systems¶
You are looking at code or a design doc. The following symptoms mean the system has not made the reframe yet. None of these is automatically wrong; each one is a flag to ask the harder question.
| Symptom | Question to ask |
|---|---|
| Tool definitions live in the same file as agent code | Who owns this contract when the underlying API changes? |
| Tool wrappers raise exceptions from the underlying SDK directly to the agent | What does the model do with a Python traceback? |
| The underlying service uses one service account for all agent calls | What is the smallest set of permissions this tool needs? |
| There is no version field on the tool definition | How will you know which contract a given call was built against? |
| There is no idempotency key on write tools | What happens if the network blips mid-call? |
| Logs of tool calls do not capture arguments | When this misbehaves, can you reconstruct what happened? |
| Tool schemas are docstrings, not validated objects | What stops the model from passing fields you didn't anticipate? |
| Adding a new field to the underlying API requires no agent-side coordination | What stops upstream changes from silently breaking the agent? |
If a system has four or more of these, the reframe has not landed. The remediation is not "fix the bugs" — it is "treat this as an API platform problem and assign owners accordingly."
The wrong mental models¶
Three wrong models are common. Naming them helps you stop using them.
"The tool is just a Python function." It is the function only at the agent's import site. Behind that function is a network call, a credential, an upstream contract, and a downstream owner. The function is the handle, not the thing.
"The tool is whatever the SDK gives me." SDKs ship for human-written clients. They raise SDK-shaped exceptions, log SDK-shaped logs, and expect SDK-shaped retry behaviour. None of that is what an LLM client needs. The contract layer translates the SDK's world into the model's world.
"The agent platform owns the tool contract." Half-true. The agent platform owns the agent-facing schema and description. The team that owns the downstream system owns the underlying API and its evolution. The contract is the joint artefact between them. If only one side owns it, the other side will move without notice.
Interview Q&A¶
Q1. What is wrong with the statement "the tool is just a function the model calls"? The model is the call site; the function is the handle. Behind the handle is a network call to a system owned by another team, a credential with side-effect authority, a versioned contract that can drift, and an audit responsibility. Calling it "just a function" hides every one of those responsibilities. Wrong-answer notes: answering only "it makes network calls" misses the ownership and versioning. Answering only "it has side effects" misses the LLM-specific retry and argument-space problems.
Q2. How is an LLM tool client different from a normal API client? Three concrete ways. It cannot read your changelog, so deprecation must be encoded in the contract itself, not in docs. It retries semantically, so non-idempotent endpoints can receive two functionally equivalent but textually different writes. Its argument space is unbounded, so the contract must validate rather than trust the client. Wrong-answer notes: "it is just slow" or "it sometimes hallucinates" are surface answers; the question is about boundary design consequences.
Q3. The CRM team is adding a required field to Lead. Your agent has been writing leads for six months with the old contract. What is the right design response?
The CRM team should ship the change as a new contract version. The agent platform should be notified through the contract registry, not Slack. During a dual-run window, both versions are accepted, with calls on the old version surfacing a structured deprecation warning the agent can act on. A drift monitor should already be watching the 4xx rate on lead writes and would have flagged this if it had shipped surprise. After the window, the old version is removed. Wrong-answer notes: "fix it when it breaks" is the failure mode that produced the lost leads. "Have the agent always send all fields" defeats the contract.
Q4. Where should validation of tool arguments live — in the agent runtime or at the contract boundary? Both, but they enforce different things. The agent runtime validates that the model's output parses to a tool call at all. The contract boundary validates that the arguments are well-formed under the schema and authorised under the scope. Putting all validation in the agent layer is dangerous because a different caller (a debug script, a different agent) could bypass it. Putting all validation at the contract is fine for correctness but loses the ability to teach the model anything useful before it commits. The model-facing schema teaches; the contract-side validation enforces. Wrong-answer notes: picking only one of the two reveals that the answerer has not internalised the boundary picture.
What to do differently after reading this¶
- Stop writing tools as bare Python wrappers. Write tools through a contract layer with explicit slots for the six surfaces, even if some are no-ops at first.
- Move tool contracts into their own owned directory or service, with versioning, even if your team is the same team that owns the downstream service today.
- When you see a tool wrapper in code review, mentally check it against the eight symptoms in the recognition table above. Each missing item is a question to raise.
- Stop talking about tools as "the model's tools." They are the production system's API endpoints. The model is the client.
Bridge. Once you accept the reframe, the next question is concrete: what fields does a usable tool contract actually carry? Six. Each one prevents a specific class of incident. The next chapter walks them in order — what they are, what they prevent, and what you write down when you draft a contract for the first time. → 02-contract-anatomy.md