09. Cancellation and Timeouts — stop cooking when the table is gone¶
~16 min read. Async systems stay healthy only when waiting work can be cut off cleanly and on time.
Built on the ELI5 in 00-eli5.md. The cancel bell — the signal that a request should stop — keeps the kitchen lane from wasting effort on abandoned tickets.
First picture: deadlines are part of correctness¶
Look at the flow first. A request starts. It waits on dependencies. Maybe the client disappears. Maybe the deadline expires. The work must stop.
request starts
│
├── work continues
├── deadline reached ──→ cancel
├── client disconnect ──→ cancel
└── server shutdown ───→ cancel
See. Timeout is not just performance tuning. It is contract enforcement. If your product promises a five-second answer, a ten-second success is still failure. Simple, no?
Cancellation matters even more in AI systems. Model calls are expensive. Streaming can last long. Uploads and tool calls may branch. Without the cancel bell, work keeps burning money after the user is gone.
asyncio.timeout gives bounded waiting¶
A clear pattern in modern Python is asyncio.timeout.
It sets a deadline for an async block.
import asyncio
async def fetch_with_deadline():
async with asyncio.timeout(5):
return await llm_client.generate("Hello")
Picture what this means. If the operation finishes in under five seconds, all good. If not, Python raises a timeout-related cancellation. Your code should handle it cleanly.
Worked example. Suppose upstream retrieval usually takes 300 milliseconds, but sometimes hangs for 20 seconds. You wrap it in a 2-second timeout. Now slow outliers stop poisoning the whole request. You can fallback, retry elsewhere, or return a controlled failure.
Cancellation must propagate to child work¶
Now what is the problem? A parent request times out, but child tasks keep running. Maybe an LLM stream remains open. Maybe three shard queries continue. Maybe a billing write retries pointlessly. That is a leak.
Picture the proper behavior.
parent request cancelled
│
├── child task A cancelled
├── child task B cancelled
└── upstream stream closed
This is why structured task ownership matters. If the parent order ticket owns sub-tasks, cancellation can fan out cleanly. If work was detached carelessly, you may never stop it. See. Detached tasks make the cancel bell hard to hear.
One practical example is SSE token streaming. The browser tab closes. Your route should detect disconnect or cancellation. It should stop reading from the provider stream. It should close network resources. It should not keep relaying tokens into the void.
Always clean up in finally¶
Cancellation is not a normal branch.
It can interrupt awaits at awkward points.
So cleanup must be explicit.
Use try / finally around resources.
async def stream_answer():
upstream = await llm_client.open_stream()
try:
async for chunk in upstream:
yield chunk
finally:
await upstream.aclose()
This is a good pattern.
Even if the client leaves,
finally still closes the upstream stream.
That saves sockets and money.
Another example.
Suppose you acquired a Redis lock.
Or opened a temp file.
Or created a tracing span.
Cleanup belongs in finally.
Not in the happy path only.
Time budgets should be layered¶
Senior services rarely use one giant timeout. They use layered budgets. Request timeout. Per-upstream timeout. Queue visibility timeout. Shutdown timeout. Each has a role.
See how useful this is. If retrieval burns all five seconds, the model call never had a chance. Budgeting forces tradeoffs into the design. It also explains metrics later.
So what to do? Define service-level goals. Then assign sub-budgets. Make retries fit inside those budgets. Make cancellation messages observable. That is adult async engineering.
Client disconnect is a real product event.
Do not treat disconnect as rare noise. Users close tabs. Mobile networks drop. Frontend code navigates away. This happens constantly.
The front desk should know when the customer left. The cancel bell should ring. The line cook should stop plating. Especially for streamed AI output, this can save large token costs.
In FastAPI and Starlette, disconnect awareness may appear through request state, stream cancellation, or WebSocket disconnect exceptions. The exact API surface varies. The design principle does not. Stop quickly. Clean up always.
Where this lives in the wild¶
- ChatGPT web backend — platform engineer: client disconnect must stop expensive ongoing generation and release upstream streaming resources.
- Anthropic streaming API relay — backend engineer: per-request deadlines prevent one bad provider call from occupying response capacity indefinitely.
- Perplexity retrieval service — search engineer: shard queries should cancel when the parent answer request times out.
- GitHub Copilot chat infrastructure — API engineer: layered budgets keep editor interactions responsive even when several dependencies are involved.
- Enterprise document ingestion service — platform engineer: queue and worker timeouts prevent stuck jobs from living forever after partial failures.
Pause and recall¶
- Why is timeout a correctness rule, not only a performance rule?
- What failure happens when parent cancellation does not reach child tasks?
- Why is
finallyessential in async cleanup code? - In the analogy, what does the cancel bell save you from wasting?
Interview Q&A¶
Q: Why use layered deadlines instead of one giant request timeout? A: Layered budgets keep each dependency honest, prevent one slow step from consuming the whole SLA invisibly, and make fallback decisions more deliberate. Common wrong answer to avoid: "One large timeout is simpler, so it is always better."
Q: Why is cancellation propagation a first-class design concern in async APIs? A: Because abandoned parent requests otherwise leave child work running, which wastes money, capacity, and resource handles without any user benefit. Common wrong answer to avoid: "Cancelled HTTP requests naturally stop every child task automatically."
Q: Why is finally more important in async code than many beginners expect?
A: Awaits can be interrupted by cancellation mid-flow, so cleanup cannot rely on the happy path reaching the end of a function.
Common wrong answer to avoid: "finally is mostly for style, not correctness."
Q: Why should retries respect the overall request timeout? A: Because a successful retry that arrives after the user’s deadline is still a failed product outcome and may also block unrelated work longer than budgeted. Common wrong answer to avoid: "A request is successful as long as one retry eventually works."
Apply now (5 min)¶
Exercise. Choose one upstream call in your design. Set a timeout for it. Then decide what cleanup must happen if that timeout fires. List at least two resources.
Sketch from memory. Draw one parent order ticket and two child tasks.
Show the cancel bell reaching all three.
Add one finally cleanup box.
Bridge. SSE streams one way. But chat tools sometimes need both sides talking continuously. That is where WebSockets enter. → 10-websockets-bidirectional.md