Skip to content

12. API for AI Agents — menus must be built for robot waiters too

~13 min read. Agents need crisp contracts because guessing tools wastes tokens fast.

Built on the ELI5 in 00-eli5.md. The menu — API documentation — should help machine callers choose tools without human guesswork.


1) Agent-facing APIs need narrower names and stricter descriptions

A human developer can read examples, infer missing details, and survive minor quirks. An agent struggles sooner because it follows the contract literally.

So an agent-friendly menu must answer clearly:

  • what tool exists
  • what inputs are required
  • what output shape returns
  • what errors are recoverable
  • what long tasks do next

Think of robot waiters. If the menu says "chef special, maybe spicy, ask staff," the robot freezes. It needs named fields, allowed values, and predictable shapes.

Bad description:

create task
input: some details
output: success info

Better description:

name: create_task
description: create one task for a project board
input schema: title string required, priority enum optional
output schema: task_id string, status enum, created_at timestamp

Worked example. An agent choosing between search_customer and get_customer needs exact differences. If both descriptions overlap vaguely, it may call the wrong tool repeatedly. That wastes latency, tokens, and rate budget.

2) Function calling works best when schemas are strict and outputs are typed

Most LLM tool use depends on structured declarations. Sometimes this is JSON Schema. Sometimes OpenAPI drives the same idea. The principle stays simple: make tool arguments explicit enough for machines.

The agent usually needs these pieces.

  • tool name
  • short purpose
  • input schema
  • output schema
  • auth or scope rules

A good schema reduces misuse. Required fields stay explicit. Enums prevent random spelling. Nested objects remain predictable.

agent ──reads──→ tool schema
agent ──fills──→ JSON arguments
API   ──returns→ structured result

Worked example for a support tool.

{
  "name": "lookup_order",
  "description": "Fetch one order by public order id",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": { "type": "string" },
      "include_items": { "type": "boolean" }
    },
    "required": ["order_id"]
  }
}

Notice what is missing. No vague prose like "send useful context." No overloaded fields. No hidden defaults. The caller should not guess how to fill the order slip.

Structured outputs matter too. If the tool returns free-form paragraphs, another step must parse them again. That is brittle.

Prefer this:

{ "order_id": "ord_123", "status": "shipped", "eta_days": 2 }

over this:

Order ord_123 seems shipped and may arrive in around two days.

Humans may enjoy the second sentence. Agents prefer the first object every time.

3) Streaming helps because agent work is visible, partial, and ongoing

Chat and agent workflows often need partial output. Users want visible progress. Supervising systems want incremental events. That is why streaming matters.

For many HTTP chat experiences, Server-Sent Events is enough. It is simple, one-way, and friendly for browsers. WebSockets fit when you need richer bidirectional control.

client opens stream
server sends event chunks
client renders partial answer
server sends done event

A tiny SSE sketch looks like this.

┌────────┐   GET /chat/stream   ┌────────┐
│ client │ ───────────────────→ │ server │
│        │ ←─ event: token      │        │
│        │ ←─ event: tool_call  │        │
│        │ ←─ event: done       │        │
└────────┘                      └────────┘

Worked example. Suppose an agent answers a travel question and calls a pricing tool midway. Streaming lets the frontend show progress instead of a blank spinner. It also lets supervisors observe tool calls before the final answer arrives.

Useful event types stay boring and stable.

  • token or delta text
  • tool call started
  • tool call finished
  • warning
  • final summary
  • done

If a supervisor expects tool_call_started, renaming it silently breaks orchestration. Stable event contracts matter just like stable JSON fields.

4) Error messages and long-running jobs must guide the next move cleanly

Ordinary API errors often stop at "request failed." Agent APIs should teach recovery. The caller is automated. Tell it what to fix, retry, or poll next.

Bad error:

{ "error": "bad request" }

Better error:

{
  "type": "https://api.example.com/problems/invalid-tool-arguments",
  "title": "Tool arguments failed validation",
  "status": 400,
  "code": "INVALID_ARGUMENTS",
  "detail": "start_date must be before end_date",
  "retryable": false,
  "field_errors": [
    { "field": "start_date", "message": "must be before end_date" }
  ]
}

Now the agent can repair arguments instead of repeating the same mistake. Useful extra fields are retryable, field-level errors, missing scope, and next action. Bad docs create bad calls. Bad errors create bad retries.

Long-running operations need equally honest contracts. Do not keep one request hanging forever unless streaming truly fits the task. Often an async job API is cleaner.

POST /reports        ──→ 202 Accepted + job_id
GET  /reports/{id}   ──→ pending/running/succeeded/failed
optional webhook     ──→ callback when complete

Worked example. An agent asks for a monthly finance report. The backend returns job_id = rep_9001 and a poll URL immediately. Later, the poll response returns status = succeeded and a download URL. Now the caller knows whether to wait, poll, cancel, or continue planning elsewhere.

So the bigger lesson is simple. AI agents do not need magical APIs. They need disciplined APIs. Clear menu, strict schemas, useful stream events, and honest async flows.


Where this lives in the wild

  • OpenAI platform engineer designs tool schemas and streaming responses so applications can orchestrate agent calls with minimal guesswork.
  • Anthropic API engineer shapes structured tool use and clear error signals for models that call external functions safely.
  • GitHub Copilot platform engineer exposes coding actions and streaming status so agent loops can show progress and recover from bad arguments.
  • Zapier AI orchestration engineer turns thousands of app actions into stable, schema-driven tools for automated workflows.
  • Notion AI platform engineer designs long-running knowledge actions where agents need polling, structured results, and permission-aware failures.

Pause and recall

  1. Why are vague tool descriptions worse for agents than for humans?
  2. What makes structured outputs easier for agent pipelines to use?
  3. Why is SSE often a good fit for chat-style streaming?
  4. What async pattern suits long-running work better than endless waiting?

Interview Q&A

Q: Why should agent-facing tools use strict schemas? A: Strict schemas reduce guessing, make validation explicit, and lower repeated bad tool calls. Common wrong answer to avoid: "Because LLMs cannot read prose" — they can read prose, but production systems need stronger contracts than prose alone.

Q: Why prefer structured output over free-form text for tool responses? A: Structured output removes extra parsing and makes downstream automation safer. Common wrong answer to avoid: "Because JSON is always smaller" — the real benefit is reliability, not byte count.

Q: Why use streaming for agent chat APIs? A: Streaming exposes progress, reduces blank waiting, and helps supervisors react early. Common wrong answer to avoid: "Because streaming makes models faster" — it improves experience and control, not raw model speed.

Q: Why model long-running tasks as jobs with polling or callbacks? A: That contract survives long durations, retries, and network interruptions better. Common wrong answer to avoid: "Because HTTP cannot handle long requests" — it sometimes can, but job contracts are usually clearer.


Apply now (5 min)

Exercise: Design one tool schema for lookup_invoice. List required fields, output fields, and one recoverable validation error. Then add one streaming event name and one async job status.

Sketch from memory: Draw an agent reading a menu, sending one order slip JSON payload, then receiving stream events and a final structured result.


Bridge. Agent APIs are useful, but some design questions stay messy. Next we admit the unsettled parts honestly. → 13-honest-admission.md