00. Async Python & FastAPI for AI Services — The Five-Year-Old Version¶
After realtime systems, we now learn the kitchen that serves AI requests without blocking the whole restaurant.
Imagine a busy restaurant at dinner time. Customers keep arriving. Some want soup. Some want a full thali. Some want a custom dessert.
Now picture a bad kitchen. One cook starts one order. Then that cook just stands still waiting for bread to bake. Meanwhile, ten other tables keep waiting. This is blocking code. It wastes time while work is idle.
A better kitchen behaves differently. The front desk takes every order quickly. Each order becomes an order ticket. The kitchen lane decides which ticket moves now. A line cook chops vegetables for one ticket. Then pauses and lets another ticket move while the sauce simmers. Simple, no?
AI services work like that kitchen. One request may call an LLM. Another may read a database. Another may stream tokens to a browser. Another may queue a long document job. If we block on every slow step, users pile up fast. If we switch wisely, the system stays responsive.
The pass window is where partial output goes back. That matters for chat apps. Users want the first token early. They do not want to wait for the whole essay. So we stream. Plate by plate. Token by token.
The prep shelf is where slow work continues outside the main order. Think PDF parsing. Think embedding millions of chunks. Think nightly summaries. The customer should not stand at the counter for that. We accept the request. Then background workers continue from the prep shelf.
And sometimes the customer leaves. Then the cancel bell rings. A smart kitchen stops plating a dish nobody wants. A smart API stops expensive work when the browser disconnects. See. Cancellation is not a side detail. It saves money.
So this module is about building that kitchen properly. We will learn asyncio. We will learn FastAPI. We will learn streaming, retries, background jobs, and shutdown. By the end, you should know how to serve real AI traffic safely.
A tiny picture first¶
customer request
│
▼
┌─────────────────┐
│ front desk │
└────────┬────────┘
▼
order ticket
│
▼
┌─────────────────┐
│ kitchen lane │ ◀── decides who runs now
└──┬──────────┬───┘
▼ ▼
line cook line cook
│ │
├── waits on LLM
│
└── another ticket moves
│
▼
pass window ──→ customer sees tokens
│
▼
prep shelf ──→ slow jobs continue later
Look. No cook needs to freeze the whole kitchen. That is the spirit of async services.
The placeholders you will see called back¶
| Placeholder | Meaning |
|---|---|
| Front desk | The HTTP/API entry point that receives requests. |
| Order ticket | The validated request and its coroutine task. |
| Kitchen lane | The asyncio event loop that schedules waiting work. |
| Line cook | The function actively doing useful work right now. |
| Pass window | Streaming output sent back before the full job ends. |
| Prep shelf | Background jobs and queues for long-running work. |
| Cancel bell | Timeouts, disconnects, and cancellation signals. |
What's coming¶
- 01-sync-vs-async.md — why blocking code collapses under concurrent AI requests.
- 02-event-loop-coroutines.md — the kitchen lane, coroutines, tasks, and
await. - 03-fastapi-basics.md — routes, dependency injection, and Pydantic models.
- 04-async-endpoints.md — when
async defmatters, and when plaindefis fine. - 05-streaming-responses.md — SSE and token streaming through the pass window.
- 06-background-tasks.md — prep shelf work, queues, and long jobs.
- 07-concurrency-patterns.md — gather, semaphores, rate limits, and pooled connections.
- 08-error-handling-retries.md — failures, backoff, retries, and circuit breakers.
- 09-cancellation-timeouts.md — cancel bell logic, cleanup, and deadlines.
- 10-websockets-bidirectional.md — two-way chat channels with WebSockets.
- 11-testing-async.md — async tests with pytest and
httpx.AsyncClient. - 12-deployment-production.md — workers, Docker, health checks, and graceful shutdown.
- 13-honest-admission.md — what still stays hard in async Python systems.
Bridge. Before learning the kitchen lane, first see exactly why a blocking kitchen fails when many tables order together. → 01-sync-vs-async.md