08. Versioning and deprecation¶
Validation enforces what the contract says today. Versioning is how that contract changes tomorrow without breaking every caller that was built against the old one.
A platform team in Noida ships an agent that has been running well for nine months. The team owning the CRM tool deletes a deprecated parameter — legacy_source_id — that nobody on the CRM side believed was being used. They ship the change on a Tuesday morning. By Tuesday afternoon, the agent's lead-creation tool is returning 400 Bad Request: unknown field 'legacy_source_id' on every call. The agent platform's tool wrapper has been sending the field for nine months, populated from a now-defunct upstream that the agent platform's previous owner had cobbled together. Nobody noticed. The CRM team did not consult an inventory of callers because there was no contract registry. The agent platform's tool was not versioned, so there was no signal that the contract had changed. The fix takes six hours of triage to find, ten minutes to apply, and produces a postmortem with the same conclusion every other contract-versioning postmortem reaches: the contract was changed by one team and consumed by another, and there was no protocol between them.
This chapter is the protocol. Semver applied to tools, dual-run windows for breaking changes, and the sunset comms that turn deprecation from a surprise into a process.
What versioning solves¶
Versioning solves the same problem in tool contracts as it does in any other API: it lets producers and consumers evolve at different speeds without breaking each other. The change is not technical — semver is well-understood — but operational. The agent platform must treat tool contracts as a public API of the underlying systems, even when those systems are owned by the same company.
Every tool contract carries a version. Every call sends the version it was built against. The contract layer can refuse calls older than a documented floor. Breaking changes go through a dual-run window with explicit deprecation comms.
That is the whole policy. The rest is mechanics.
Semver, applied to tool contracts¶
The standard semver split, adapted:
| Bump | What changed | Caller impact |
|---|---|---|
| Patch (2.1.0 → 2.1.1) | Internal fix, no contract change visible to callers; description clarification | None |
| Minor (2.1.0 → 2.2.0) | Backwards-compatible additions: new optional field, new enum value (additive), new error code added to enum | Callers continue to work; new capability available to callers that opt in |
| Major (2.x → 3.0.0) | Breaking change: required field added, field removed, field renamed, error code repurposed, return shape changed, semantic shift in what the operation does | Existing callers must migrate before the dual-run window closes |
Two rules carry most of the value.
Rule 1 — additive minors do not break callers. Adding an optional field is a minor bump. Adding a new enum value is a minor bump only if callers will not be surprised by it. If callers branch on enum values (the model often does), adding a value is closer to a major change because old callers do not know what to do with the new value. The conservative move is to introduce new enum values as a minor with a documented "unknown-value behaviour" in the contract.
Rule 2 — error enum additions are minor; error enum changes are major. Adding a new error code is fine; existing callers will treat it as an unknown error and follow the catch-all. Removing or renaming an error code is major because some caller — including some prompt — is branching on the old name.
What lives in the version¶
The version applies to the contract, not to the underlying system. The underlying system might be on version 14.7 of Salesforce; your tool contract for create_lead is on version 3.2.1 because that is the third major contract revision you have shipped. The two version numbers are independent. The tool contract abstracts the underlying system's version away from the agent.
Practical consequence: a downstream system upgrade may be invisible to your callers (no contract change needed), or may force a contract major bump (if the upgrade breaks shape or semantics). The contract is the artefact you control; the upstream version is a property you record but do not expose.
Sending the version on every call¶
The call carries the version the caller was built against:
{
"tool": "issue_refund",
"version_built_against": "2.1.0",
"arguments": { ... },
"idempotency_key": "...",
"tenant_id": "..."
}
The contract layer reads version_built_against and decides:
- If the version matches the current major and minor — proceed normally.
- If the version is older minor of the same major — proceed; warn in audit and (optionally) include a deprecation hint in the response.
- If the version is older major — proceed only if within the dual-run window for that major; reject otherwise.
- If the version is newer than what the contract layer knows about — reject; the caller is on a contract the layer cannot honour.
The version is not chosen by the model. It is set by the agent platform's tool-definition pipeline: when the platform loads tool definitions from the registry, it stamps the version into the wrapper, and every call carries it.
Backwards-compatible minor bumps¶
The common case. You are adding a field, an enum value, or a new error code, and existing callers should continue to work.
Procedure for adding an optional field:
- The contract draft adds the field with
required: falseand a sensible default for callers that don't provide it. - The downstream system handles "field absent" the same way it handled it before.
- Bump minor.
- Update tool descriptions seen by the model so new agents can use the field.
Procedure for adding an enum value:
- The contract draft adds the value to the enum.
- Document an "unknown-value behaviour" — what the contract layer does when the old caller receives a response containing the new value.
- Bump minor.
- Consider whether downstream consumers of the response need a parallel update; they often do.
Procedure for adding an error code:
- The contract draft adds the code with full structure (retriable, human_hint, model_action).
- The translator (chapter 05) maps the relevant downstream error.
- Bump minor.
- Watch for any old callers that branch on the
UPSTREAM_UNCLASSIFIEDcatch-all — the new code may now consume cases that used to fall to the catch-all.
Breaking changes — the dual-run window¶
A major bump is a breaking change. Callers must migrate, but they cannot all migrate instantly. The pattern is a dual-run window: v2 and v3 of the same tool coexist for a defined period; new callers are built against v3; old callers continue to call v2 with deprecation warnings; after the window, v2 is removed.
The window:
identity:
name: issue_refund
version: 3.0.0
owner: payments-platform
purpose: ...
deprecation:
supersedes: 2.x
dual_run_started: 2026-04-01
dual_run_ends: 2026-07-01
migration_guide: https://docs.internal/tools/issue_refund/migrate-to-v3
reason: |
v2 returned amount as a string; v3 returns amount_minor as an integer.
v2 accepted free-text reason; v3 requires reason_code enum.
Three months is a common dual-run window length. The right length depends on:
- How many callers exist (more callers → longer window)
- How disruptive the migration is (renamed field → shorter; semantic change → longer)
- How urgent the reason for the breaking change is (security fix → shorter; cleanup → longer)
Less than two weeks is a panic window; more than six months suggests the breaking change should be redesigned as a backwards-compatible addition.
What runs during the window¶
During the window:
- v2 and v3 are both deployed.
- Both are documented in the registry.
- Calls with
version_built_againstin the v2.x range are routed to the v2 implementation. - Calls with v3.x are routed to v3.
- Responses to v2 callers include a
deprecationfield with the dual-run end date and migration link. - Audit logs the version used on every call.
- A "v2 traffic" metric is monitored; the migration is not done until v2 traffic is near zero.
Sunset comms¶
Communications are not optional. The producer team is responsible for:
- At dual-run start: notice to all known consumers (agent platforms, internal teams, anyone whose audit log shows v2 calls). Migration guide published.
- At one month before end: reminder to consumers still on v2.
- At one week before end: final warning with traffic numbers attached ("you sent 12,304 calls on v2 last week; v2 will be removed on $DATE").
- At sunset: v2 returns a structured
DEPRECATED_VERSION_RETIREDerror with a pointer to v3.
The communications channel matters. Slack messages are not contracts. The producer team should maintain a changelog and a notification mechanism (email, calendar reminder, ticket created against the consumer team) for each consumer.
What "an old caller" means in an agent platform¶
A subtlety. In a traditional API, "old caller" means a binary or service that was built against the old version and not redeployed. In an agent platform, the model sees the tool description at runtime. If the registry serves the new description, the model is, in a sense, always "current."
But the contract layer is not. The contract layer's validators, translators, scope mappings, and idempotency logic are versioned. When you bump to v3, the platform must:
- Update the registry's tool description (so the model sees the new schema).
- Update the contract layer's validators and translators (so calls are validated against v3).
- Optionally: keep the v2 contract layer running in parallel for callers — typically other internal services, not the agent itself — that still send v2 calls.
The agent platform usually migrates atomically: when a new tool version ships, the agent is rebuilt against the new tool description. Old callers are then non-agent callers — direct API consumers, batch jobs, other internal teams.
When not to bump major¶
A common failure mode: a team is uncomfortable with a backwards-compatible change and bumps major anyway, "to be safe." This is expensive. Major bumps cost migration effort across every consumer. Reserve them.
Some changes that look major but should be minor:
- Adding an optional field. Minor.
- Adding a new enum value with documented unknown-value behaviour. Minor.
- Making a response field nullable that was previously always present. Minor if you can ensure no caller crashes on null; major if you cannot.
- Adding a new error code. Minor.
- Adding a new tool. Not a bump on existing tools at all; just a new contract.
Some changes that look minor but are major:
- Removing an optional field. Major (callers may have been sending it; even if they shouldn't have been).
- Tightening a regex pattern. Major (calls that used to pass will now fail).
- Changing the semantics of a field without changing its name. Major (caller's understanding of what the call does is now wrong).
- Changing a response field's type (string → int). Major.
The general test: is there any caller whose existing code will behave differently after the change? If yes, major.
Versioning interactions with the other surfaces¶
- Schema (chapter 02). The schema is the most-versioned surface. Most contract changes are schema changes.
- Class (chapter 03). Class changes are usually major. Promoting a tool from write-idempotent to irreversible is a behaviour change callers depend on.
- Idempotency (chapter 04). Idempotency policy changes (e.g., shortening the dedup window) are major if callers were relying on the old window for crash recovery.
- Error contract (chapter 05). Adding codes is minor; removing or renaming is major.
- Scope (chapter 06). Tightening scope requirements is major; loosening is rare and questionable.
The contract registry¶
All of this assumes a place where contracts live. A contract registry is the source of truth for versioned contracts. It is not optional in a serious platform.
What lives in the registry per tool:
- Identity (name, owner, purpose)
- Current version, dual-run window if any, supported versions
- Schema (parameters, returns)
- Error enum
- Side-effects documentation
- Operational metadata
- Changelog (every version, what changed, who approved)
Implementation choice:
- A directory of YAML/JSON files in a git repo (simple, audit-friendly, slow to query)
- A small service backed by a database (queryable, ops overhead)
- An API description format like OpenAPI with custom extensions (interoperable with existing tooling)
The simplest workable option is a git repo. Every contract is a file. PRs are the change mechanism. The CI pipeline validates the schema, the version bump rules, the changelog, and the dual-run metadata. Promotion to production is a tag.
How to recognise broken versioning in the wild¶
- A team owning a downstream system deprecates a field without notifying any consumer
- The contract layer has no
version_built_againstfield on calls - Versions are stuck at "1.0.0" forever; nobody bumps them
- Major changes ship without a dual-run window
- v2 still serves 30% of traffic six months after v3 launched
- The changelog is "see git log"
- The "current version" is whatever the latest deploy happened to bundle
Interview Q&A¶
Q1. The CRM team wants to add a required field to the create_lead tool. They argue the change is "small" because the field has a sensible default. How should this ship?
The change is breaking. A caller built against the old schema is not sending the field, and "sensible default" is a producer-side decision; consumers do not know about it. The right ship is: bump major; introduce v3 with the new required field; v2 continues to accept calls without the field (perhaps populating the default server-side, with a deprecation warning) for the dual-run window; after the window, v2 is retired. Alternative: keep the field optional in the contract and populate the default on the producer side, which makes the change minor. The choice depends on whether the field's meaning requires caller intent or whether a default is genuinely fine; that is the producer team's call to defend. Wrong-answer notes: "just add it, callers won't notice" is the chapter's opening incident.
Q2. v3 of a tool has been live for four months. v2 is still receiving 8% of traffic. The team is debating whether to sunset v2 anyway. What do you advise? Find out who the v2 callers are before sunsetting. Eight percent is meaningful traffic. Pull the audit logs for v2 calls, group by caller identity, and contact each caller team directly with their numbers. If they cannot migrate within an extension window, the dual-run extends. Sunsetting on schedule "to enforce discipline" produces an outage for the consumers and a postmortem about why nobody was talking to them. The producer's job at sunset is to have driven traffic to near-zero through comms, not to enforce a deadline. If a consumer is genuinely unable to migrate, the producer should ask whether the change really required major in the first place. Wrong-answer notes: sunsetting on schedule without contacting consumers is "rule-following over judgment".
Q3. You're auditing the contract registry and find a tool stuck at v1.0.0 that has been changed eleven times since launch. What does this tell you? That the team is not versioning. Either every change has been a patch (unlikely over eleven changes that touched schema and behaviour) or the version field is dead weight. The risk is that consumers cannot tell what they are calling, the changelog is inferred from git, and breaking changes have shipped without dual-runs. The remediation is to (a) audit the actual changes, classify each as patch/minor/major, and resync the version to reflect them; (b) institute the version-bump rule in CI; (c) check whether anything actually broke at the boundary during those eleven changes — that is where to look for hidden incidents. Wrong-answer notes: "it's fine, the tool works" misses that the consumer-side risk is silent until it isn't.
Q4. How does versioning interact with prompt engineering — specifically, with the tool description the model sees? The tool description is part of the contract; changes to it are versioned. A description change that re-frames when to use the tool can change agent behaviour materially even if the schema is unchanged. The version-bump rule covers descriptions: clarifying wording is patch; expanding usage scope is minor; restricting usage scope or changing semantics is major. The model is one of the consumers; the prompt-engineering team is, effectively, a consumer team that has to know about description changes. The registry's notification flow should include the prompt-engineering team. Wrong-answer notes: "descriptions aren't really versioned" misses that the model branches on descriptions; description changes are behaviour changes.
What to do differently after reading this¶
- Set up a contract registry, even if it is a git directory. Make it the source of truth.
- Stamp
version_built_againstinto every call. Audit logs should always carry the version. - Adopt the version-bump rules (additive minor, breaking major) in CI. A PR that changes the schema must also bump the version per the rules, or fail.
- For every existing tool, write down its current "version" honestly. Most will be more than 1.0.0 in reality.
- When a producer team plans a breaking change, the platform team should be in the planning meeting with consumer counts in hand.
Bridge. Versioning is how the contract changes intentionally. Drift is how it changes unintentionally — the downstream system shifts under your feet without telling you. The next chapter builds drift detection: contract pacts that hold the downstream to its word, monitors that fire on schema deviations, and the alarm topology that catches a silent breakage in minutes instead of weeks. → 09-integration-drift-detection.md