07. API Contract Design — make the hallway explicit before traffic starts¶

~13 min read. Loose contracts feel fast, then one missing field burns a sprint.

Built on the ELI5 in 00-eli5.md. The hallway — the agreed passage between rooms — decides what shape may pass and how future changes stay safe.

1) A contract is executable agreement, not decorative documentation¶

An API contract is the written version of the hallway. It tells callers what may enter, what returns, and what breaks. See. When the hallway is vague, every team invents its own map. Suppose checkout sends amount = 499. Payments reads it as rupees. Ledger reads it as paise. One order becomes either ₹499 or ₹4.99. Same code path, completely different money. Simple, no? A useful contract names five things clearly. Request fields, response fields, required values, allowed errors, versioning rules. That is already half the design battle.

┌────────────┐     ┌──────────────────┐     ┌──────────────┐
│ Web client │───▶ │ API contract     │───▶ │ Order service│
└────────────┘     │ types + rules    │     └──────────────┘
┌────────────┐     │ errors + limits  │
│ Mobile app │───▶ │ examples         │
└────────────┘     └──────────────────┘

Worked example. Three clients call POST /orders. Web sends 12 fields, iOS sends 11, Android sends 10. If the contract marks 7 fields as required, all three teams align early. If not, production becomes the meeting room. Code-level example in TypeScript:

export type CreateOrderRequest = {
  orderId: string;
  amountPaise: number;
  currency: 'INR' | 'USD';
  customerId: string;
};

This type alone is not the full contract. But it shows the habit. Name the hallway precisely, then enforce it everywhere. So what to do? Use machine-readable contracts, not tribal memory.

2) OpenAPI, protobuf, and JSON Schema solve different boundary problems¶

People sometimes ask, "Which one is best?" Wrong question. Ask instead, "Which boundary am I controlling?" The boundary decides the tool. OpenAPI fits HTTP APIs well. It describes paths, query params, headers, status codes, and examples. Frontend teams, QA, and SDK generators all benefit quickly.

paths:
  /orders:
    post:
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateOrderRequest'

Protobuf fits typed RPC nicely. It gives compact payloads and strict field numbering. Rolling deploys become safer because the wire format stays disciplined.

message CreateOrderRequest {
  string order_id = 1;
  int64 amount_paise = 2;
  string currency = 3;
  string customer_id = 4;
}

JSON Schema fits JSON payload validation. It is excellent for configs, events, forms, and shared JSON bodies. You can declare formats, ranges, patterns, and optional sections.

{
  "type": "object",
  "required": ["email", "age"],
  "properties": {
    "email": { "type": "string", "format": "email" },
    "age": { "type": "integer", "minimum": 18 }
  }
}

See one compact comparison.

┌──────────────┬──────────────────────┬─────────────────────────┐
│ Tool         │ Strongest fit        │ Watch out for           │
├──────────────┼──────────────────────┼─────────────────────────┤
│ OpenAPI      │ Public HTTP surface  │ Spec drift from code    │
│ Protobuf     │ Internal RPC         │ Field-number mistakes   │
│ JSON Schema  │ JSON bodies/events   │ Partial validator usage │
└──────────────┴──────────────────────┴─────────────────────────┘

Worked example. Say one platform has 18 internal services and 4 public APIs. Use protobuf between services for smaller payloads and generated stubs. Use OpenAPI on the public edge for docs and SDKs. Use JSON Schema inside event topics carrying flexible JSON. Same company, different hallways. Code-level lesson. Pick one source of truth per boundary. Do not maintain five drifting copies of the same shape. That is not architecture. That is paperwork.

3) Contract-first versus code-first is really about team coordination speed¶

Contract-first means the spec arrives before the handler implementation. Code-first means code annotations or types generate the spec later. Both can work. The better choice depends on how many teams must move together. Contract-first shines when backend, frontend, QA, and partner teams move in parallel. You publish the hallway early. Mocks, stubs, and examples appear before business logic finishes. See the flow.

Spec written
   │
   ├──▶ Mock server for frontend
   ├──▶ SDK generation for partners
   └──▶ Server skeleton for backend

Worked example. Two squads share a checkout API. Frontend needs 6 days for screens. Backend needs 8 days for payment wiring. With contract-first, both start on day one. Without it, frontend waits and compresses testing dangerously. Code-first shines in smaller teams. One codebase owns both implementation and generated docs. You avoid duplicate edits when models change hourly. But discipline matters, otherwise docs lag behind the room code. Java-style code-first example:

@PostMapping("/orders")
OrderResponse create(@Valid @RequestBody CreateOrderRequest req) {
    return service.create(req);
}

TypeScript code-first example:

const CreateOrderSchema = z.object({
  orderId: z.string().uuid(),
  amountPaise: z.number().int().positive(),
  currency: z.enum(['INR', 'USD']),
});

So what to do? Choose contract-first for shared platforms, partner APIs, and long support windows. Choose code-first for tight teams with strong generation tooling. Best of all, keep the spec and code generated from one dependable floor plan.

4) Validation belongs at the edge, and compatibility belongs in every release¶

Schema validation should run as early as possible. Reject bad shapes before business logic starts improvising. The edge is cheaper than the core. See.

Request
  │
  ▼
Schema validation ──invalid──▶ 400 with field errors
  │
  ▼
Business validation ──rule fail──▶ 409 or 422
  │
  ▼
Handler + persistence

Concrete Node example using AJV:

const valid = ajv.validate(createOrderJsonSchema, payload);
if (!valid) {
  throw new BadRequestError(ajv.errors);
}

Concrete Java example using Bean Validation:

public record CreateOrderRequest(
    @NotBlank String orderId,
    @Min(1) long amountPaise,
    @NotBlank String currency
) {}

Now backward compatibility. At code level, old clients must keep working during rollout. That means changes are not equal. Some are safe, some are secretly explosive. Usually safe changes: - add an optional response field or request field with a default; - widen validation without changing meaning and reserve old protobuf numbers after removal. Usually dangerous changes: - rename amountPaise to amount or make an optional field required; - change string to number silently or reuse protobuf field number 3 for new meaning. Bad protobuf evolution:

message User {
  string email = 3;
}

Later, never do this:

message User {
  int64 loyalty_points = 3;
}

Those bytes already mean email to older consumers. The wire remembers even when developers forget. Simple, no? Reserve the removed number instead. Worked example. Suppose 40% of Android users stay one version behind for two weeks. If you suddenly require couponCode, four of ten apps may fail checkout. That is not a neat refactor. That is a revenue leak. So what to do? Deprecate first, measure adoption, support overlap, then remove carefully. Contract safety is a release habit, not a one-time slide.

Where this lives in the wild¶

A Stripe API engineer evolves payment fields carefully so old merchant SDKs still serialize requests safely.
A Swiggy mobile-platform engineer depends on stable OpenAPI specs so Android and iOS releases can move independently.
A Google infrastructure engineer uses protobuf definitions to keep many internal services compatible during rolling deploys.
A Razorpay backend engineer validates webhook payloads against JSON Schema before ledger code touches money movement.
A Postman product engineer generates mocks and examples from OpenAPI so developer onboarding stays consistent.

Pause and recall¶

When does OpenAPI fit better than protobuf?
Why is contract-first useful when multiple teams move together?
Which schema changes are usually backward compatible at code level?
Why should validation happen before business logic?

Interview Q&A¶

Why contract-first not code-first for a shared partner API?¶

Because partner teams need the hallway before your service finishes implementation. Mocks, SDKs, reviews, and negative tests can start earlier. Common wrong answer to avoid: "Contract-first is always better because it looks more formal."

Why protobuf not JSON over HTTP for internal high-volume RPC?¶

Because protobuf gives smaller payloads, generated stubs, and disciplined compatibility rules. That matters when many services roll independently all day. Common wrong answer to avoid: "Binary is automatically better for every boundary."

Why schema validation not handwritten `if` checks scattered everywhere?¶

Because one schema keeps rules central, testable, and reusable across handlers. Scattered checks drift fast and miss edge cases. Common wrong answer to avoid: "Manual checks are flexible, so structure is unnecessary."

Why add optional fields not rename required ones immediately?¶

Because old clients ignore extra optional data more safely than missing renamed fields. Compatibility is about rollout reality, not only clean code. Common wrong answer to avoid: "Clients should update quickly, so breaking changes are acceptable."

Apply now (5 min)¶

Exercise. Design POST /shipments for an ecommerce system. Write 5 request fields, 3 validation rules, 2 response fields, and 2 error codes. Then express the same request once in OpenAPI and once in protobuf. Sketch from memory. Draw the edge-validation pipeline, then list four safe and unsafe contract changes side by side. Say aloud which change breaks old clients first and why.

Bridge. A strong hallway tells services what may pass, but behavior still needs legal moves, guards, and clear transitions. → 08-state-machines-and-workflows.md