Skip to content

07. Validation: preconditions, postconditions, and dry-runs

Scopes bound what the call can do. Validation bounds what the call can say — what arguments are allowed in, what the response must look like before it returns, and how to test the effect of a write without committing it.


A backend lead at a Kolkata fintech reviews the agent's loan-disbursement tool ahead of a security audit. The tool's wrapper validates that amount_minor is a positive integer and that account_number matches the RBI account-number pattern. The wrapper then calls the disbursement service. The lead asks: "what if the account belongs to a person who is on the platform's internal sanctions list?" The team checks. There is no check in the wrapper; the disbursement service is supposed to enforce sanctions. The team checks the disbursement service. The sanctions check exists, but it runs after the funds are reserved — the disbursement is committed, then the sanctions check refuses, and a compensation job tries to reverse it overnight. Sometimes the compensation succeeds; sometimes it does not. Over the past six months, there are seventeen cases where funds went to sanctioned accounts and were never recovered. The agent has been routing legitimate-looking requests to those accounts because the model has no signal that a precondition is missing until after the irreversible step.

The fix is not to teach the model about sanctions. The fix is to put the precondition where it belongs: in front of the side effect, returning a structured error the model can act on, before the funds leave the account. This chapter teaches the three validation surfaces — precondition, postcondition, and dry-run — that turn "trust the system to reject bad calls" into "the contract proves the call is safe before it commits."


The three validation surfaces

Every tool call passes through three validation moments. Each catches a different class of mistake.

Surface When it runs What it catches
Precondition Before the side effect, on the request Bad arguments, bad state, forbidden targets, policy violations
Postcondition After the response, on the result Downstream system returned a shape that doesn't match the contract, returned a logically impossible value, broke an invariant
Dry-run On demand, before commit, returning "would it succeed?" Same set as precondition, but exposes a way to test the call without performing the side effect

The three are complements, not alternatives. A well-designed tool has all three.


Surface 1 — Preconditions

A precondition is a check that must hold before the contract layer permits the side effect. The schema (chapter 02) is the first precondition layer — it refuses calls with malformed arguments. This section is about the precondition layer beyond the schema.

Examples that the schema does not catch:

  • The argument values parse but reference resources that don't exist
  • The argument values are individually valid but the combination is not (e.g., refund amount exceeds refundable balance)
  • The target of the operation is in a state that does not permit the operation (e.g., the payment is already refunded)
  • The caller's scope permits the verb but a policy refuses the call (e.g., the account is on the sanctions list)
  • The operation would violate a system invariant (e.g., the wire transfer would bring the source account below the regulatory minimum)

These checks are outside the schema because the schema cannot know the state of the world. They are inside the contract layer because they must run before the side effect commits.

A precondition stack for the loan-disbursement tool:

def disburse_loan(call: ToolCall) -> ToolResult:
    args = DisburseSchema.validate(call.arguments)         # schema precondition

    # state preconditions
    loan = loans_client.get(args.loan_id)
    if not loan:
        return DisburseErrors.LOAN_NOT_FOUND
    if loan.status != "approved":
        return DisburseErrors.LOAN_NOT_IN_DISBURSABLE_STATE.with_fields(
            {"current_status": loan.status}
        )
    if loan.amount_minor != args.amount_minor:
        return DisburseErrors.AMOUNT_MISMATCH.with_fields(
            {"loan_amount": loan.amount_minor, "requested": args.amount_minor}
        )

    # policy preconditions
    if sanctions.is_flagged(args.account_number):
        return DisburseErrors.ACCOUNT_SANCTIONED
    if not kyc.is_valid(loan.borrower_id, as_of=now()):
        return DisburseErrors.KYC_EXPIRED

    # invariant preconditions
    if not treasury.has_balance(args.amount_minor, currency="INR"):
        return DisburseErrors.INSUFFICIENT_TREASURY_BALANCE

    # only now: the side effect
    return _execute_disbursement(args, idempotency_key=call.idempotency_key)

Each precondition is one named check. Each returns a structured error from the contract's enum if it fails. The order matters: cheaper checks first (existence, state), more expensive checks later (policy lookups, treasury queries). On the happy path every check passes and the side effect runs; on the unhappy path the model receives a structured error before any irreversible step.

Designing the precondition stack

Procedure when drafting a new tool:

  1. List every state the underlying system must be in for this operation to make sense.
  2. List every policy the platform applies (sanctions, KYC, geofencing, content restrictions, time-of-day caps).
  3. List every invariant the operation must preserve (balances, quotas, counts).
  4. For each item, decide: does the precondition belong in the contract layer, the schema, or the downstream system?

The split:

  • Schema — anything that depends only on the argument values being parseable.
  • Contract layer — anything that depends on multiple sources of truth (loan state, sanctions, KYC) and must be verified before commit.
  • Downstream system — anything that requires transactional consistency with the side effect itself (e.g., the bank's own ledger check on transfer).

Avoid the trap of "the downstream system already checks this." It might — but if its check runs after commit, your contract layer must duplicate the check before commit. The chapter's opening incident is exactly this failure.


Surface 2 — Postconditions

A postcondition is a check that must hold on the response before the contract layer returns the result to the model. It catches downstream systems that have drifted, returned partial results, or violated their own response contract.

Postconditions in the loan-disbursement tool:

def disburse_loan(call: ToolCall) -> ToolResult:
    # ... preconditions and side effect ...
    raw = downstream.disburse(...)

    # response shape
    try:
        result = DisburseResponseSchema.validate(raw)
    except SchemaError as e:
        # downstream returned something we don't recognise — alarm
        log_drift_event(tool="disburse_loan", error=e, raw=raw)
        return DisburseErrors.UPSTREAM_UNCLASSIFIED.with_fields({"detail": str(e)})

    # logical postconditions
    if result.disbursed_amount_minor != args.amount_minor:
        # downstream succeeded but disbursed a different amount — invariant break
        alert(severity="page", reason="amount_mismatch_on_response",
              tool="disburse_loan", expected=args.amount_minor,
              actual=result.disbursed_amount_minor)
        return DisburseErrors.RESPONSE_INVARIANT_VIOLATION
    if result.status not in {"completed", "pending"}:
        return DisburseErrors.UNEXPECTED_RESPONSE_STATUS.with_fields(
            {"status": result.status}
        )

    return ToolResult.ok(result)

Three classes of postcondition:

  • Shape postcondition — the response parses against the contract's return schema. Schema mismatch is a drift event (chapter 09 picks this up).
  • Invariant postcondition — the response respects what the request asked for. If the request asked for ₹50,000 and the response says ₹500 was disbursed, the contract refuses to return success even though the downstream said success.
  • State postcondition — for paranoid tools, the contract layer re-reads the state after the side effect and verifies the change matches the request. Expensive; reserved for irreversible operations where the alternative is finding out about a mismatch hours later.

When postconditions catch real bugs

The first time a postcondition catches anything, the team usually argues it's "over-engineering." A week later it catches something. A month later it catches a downstream rollout that silently changed a field name from disbursed_amount to amount_disbursed. The postcondition turned a silent breakage into a structured error with a drift alarm. That is the point.

Tools with strong postconditions are also the tools that survive downstream-system changes without rolling the agent platform back. Without postconditions, a downstream change becomes a model-behaviour bug; with them, it becomes a contract-layer alert.


Surface 3 — Dry-runs

A dry-run is a mode of the same tool that performs all the validation but none of the side effect, and returns "would this call succeed if I ran it for real, and what would the result look like?"

schema:
  parameters:
    type: object
    additionalProperties: false
    required: [loan_id, account_number, amount_minor, idempotency_key]
    properties:
      loan_id:         { type: string }
      account_number:  { type: string, pattern: "..." }
      amount_minor:    { type: integer, minimum: 1 }
      idempotency_key: { type: string }
      dry_run:
        type: boolean
        default: false
        description: |
          When true, performs all precondition checks but does not disburse
          funds. Returns the same response shape with status="would_succeed"
          or the structured error that would have been returned.

When the caller (or the model) passes dry_run: true:

  • All schema, state, policy, and invariant preconditions run.
  • The side effect is skipped.
  • If everything would have passed, the contract returns a synthetic response with status: "would_succeed" and the expected fields where possible.
  • If any precondition fails, the contract returns the same structured error it would have returned for the real call.

Three uses, all valuable:

  1. The model can ask "would this work?" before committing to an irreversible action. For high-value calls, the agent flow is "dry-run, surface the expected effect to the user for approval, then execute the real call." Module 01 chapter 07 covers the approval-gate side; this is the contract-side support.

  2. CI and contract tests use dry-runs to verify the contract layer's precondition stack without setting up the side-effect path. Chapter 10 (contract testing) leans on this heavily.

  3. Operators can probe the system after a downstream change to verify their contract still admits valid calls and refuses the right ones, without actually moving money.

What dry-runs do not solve

Dry-runs do not detect races. A call that dry-runs successfully at time T may fail at time T+5s because another call changed the state. Dry-runs are a moment-in-time statement, not a reservation. If the model uses dry-run + commit, the commit must still handle the precondition failing again — the dry-run is a hint, not a hold.

Holds can be implemented (the downstream reserves the resource for N seconds), but they are a feature of the underlying system, not of the contract layer's dry-run mode. If you need a hold, it goes in the schema as an explicit reserve_for_seconds parameter.

When dry-runs are unsafe

If the precondition check itself has cost or visible side effects, the dry-run inherits them. Examples:

  • A sanctions check that logs an audit entry. The dry-run will log it too, which may be acceptable or may be considered a probe attack.
  • A read that triggers a billing event with a third-party API. The dry-run will incur the cost.

The contract must declare these in the dry-run's documentation. The model can be warned in the tool description: "Dry-run still incurs sanctions-check fees."


How validation interacts with the other contract surfaces

  • Class (chapter 03). Irreversible tools should always offer a dry-run mode. Write-non-idempotent tools should at least offer postcondition validation, often dry-run too. Read tools do not need a dry-run (they have no side effect to skip).
  • Idempotency (chapter 04). Dry-runs are read-only and do not interact with the dedup store. Preconditions run before the dedup check (you do not want to dedup an invalid call into a stale "success"). Postconditions run after the dedup record is written, so the record contains a validated response.
  • Error contract (chapter 05). Every precondition and postcondition failure produces a structured error from the contract's enum. The error's model_action field tells the model whether to dry-run again with corrected args, ask the user, or escalate.
  • Scope (chapter 06). Dry-runs may need a narrower scope than the real call (no side-effect capability needed). Some platforms issue a *:dry-run variant of each scope for this reason.

How to recognise validation gaps in the wild

  • Postmortems where "the downstream rejected it after we had already committed something else"
  • Reports of "the agent did the work then we found out later it was a sanctions match"
  • The agent calls the tool, the call succeeds, but the result is "obviously wrong" (e.g., disbursed amount doesn't match request)
  • The model retries the same call hoping a precondition will start passing
  • Engineers fearing to test write tools against staging because there is no dry-run
  • The contract has no dry_run flag on irreversible operations

Interview Q&A

Q1. The downstream service "already validates" the request. Why duplicate the check in the contract layer? Because the downstream's validation runs at a different point in its own commit sequence than yours needs to. If the downstream validates after a partial commit (the chapter's opening incident), your contract has to validate before its own commit to avoid an irreversible mistake. Duplication also catches downstream regressions: if the downstream silently removes a check, your contract still refuses bad calls. The cost is some duplicated logic; the value is independent enforcement. Wrong-answer notes: "trust the downstream" is the line that produced the seventeen-incident pattern in the opening.

Q2. Walk through what happens on a dry_run: true call for an irreversible tool. The contract layer runs schema validation, state preconditions (resource exists, in right state), policy preconditions (sanctions, KYC, etc.), and invariant preconditions (treasury balance, quotas). If any fails, it returns the same structured error as the real call would. If all pass, it returns a synthetic success response with status: "would_succeed" and the expected result fields where computable. It does not commit the side effect; it does not write to the dedup store. The model can then surface the expected effect to the user, get approval, and call again with dry_run: false. The real call must still revalidate — dry-run is a hint, not a hold. Wrong-answer notes: "the dry-run reserves the resources" is wrong unless the contract explicitly says so; "the dry-run skips all checks" defeats the purpose.

Q3. Why are postconditions valuable on a tool whose downstream system is "well-tested"? Because downstream systems change. A field is renamed, a status code is added, a partial-success path is introduced. Without postconditions, the change manifests as a downstream model behaviour problem days or weeks later. With postconditions, the contract layer detects the shape or invariant violation immediately and converts it to a drift alarm with a structured error. The discipline is independent enforcement at every boundary, not "trust" of any single party. Wrong-answer notes: "the response schema in the tool description is enough" misses that the schema is for the model; the postcondition is the enforced contract.

Q4. The team is building a delete_archive tool that wipes a customer's archived records. How would you design its validation surfaces? Precondition: verify the customer exists, the records are archived (not active), the customer's retention policy permits deletion, no legal hold is in place. Each as a named check with a structured error. Postcondition: re-read the count of archived records; verify it dropped by the expected amount; alarm on mismatch. Dry-run: implement it; the result returns the count that would be deleted and the policy decisions, so the user (via the model) can confirm before the real call. The class is irreversible; scope is narrow to a "data-deletion" capability; human gating is on by default. Validation is the difference between "we lost a customer's data we shouldn't have" and "we refused a delete that was missing a precondition." Wrong-answer notes: answers that skip dry-run or postconditions miss the point of the chapter.


What to do differently after reading this

  • For every existing tool, list its preconditions. Compare to the questions in the "designing the precondition stack" section. Add the ones that are missing.
  • Add postconditions to every write tool. Start with shape postconditions; add invariant postconditions on tools that touch money or identity.
  • Implement dry_run on every irreversible and high-value write tool. Wire it into the agent platform's approval-gate flow.
  • When a downstream system changes, the first signal you should look at is the postcondition violation rate per tool.

Bridge. Validation is what the contract enforces on each call. The next question is how the contract itself changes — because contracts that cannot change end up being lied to instead. The next chapter builds versioning and deprecation: semver for tools, dual-run windows for breaking changes, and how to retire a contract without breaking every caller. → 08-versioning-and-deprecation.md