04. Async Endpoints — use `async def` where waiting matters¶

~13 min read. FastAPI lets you write both sync and async routes, but the choice changes capacity and failure modes.

Built on the ELI5 in 00-eli5.md. The front desk — the route entry point — must choose whether a line cook waits cooperatively or blocks a thread.

First picture: FastAPI supports two kitchens¶

Look at the shape first. FastAPI can run async def handlers on the event loop. It can also run plain def handlers in a threadpool. Both are valid. The question is fitness.

incoming request
      │
      ▼
┌──────────────────────┐
│   FastAPI router     │
└───────┬──────────────┘
        │
        ├── async def ──→ event loop ──→ cooperative waits
        │
        └── def ────────→ threadpool ──→ blocking-safe wrapper

See. The framework is being practical. Some libraries are async-native. Some are still sync. Some workloads are CPU-heavy. We need to choose deliberately.

A simple rule helps. If the route mostly waits on async-capable I/O, prefer async def. If the route must call blocking sync code and cannot be changed, a plain def route may be cleaner. If the route is CPU-heavy, neither style alone solves capacity. Offload the work.

Worked example: same logic, different behavior¶

Suppose we build a chat endpoint that does three things. Read session state from Redis. Call an LLM provider. Write usage to Postgres. All three have async clients available.

@app.post("/chat")
async def chat(body: ChatRequest) -> ChatResponse:
    session = await redis.get(body.session_id)
    answer = await llm_client.generate(body.message, session)
    await usage_repo.write(body.session_id, len(answer))
    return ChatResponse(answer=answer)

This is a good async def route. The order ticket yields at every I/O boundary. One worker can keep many chats alive.

Now compare a blocking version.

@app.post("/chat-sync")
def chat_sync(body: ChatRequest) -> ChatResponse:
    session = redis_sync.get(body.session_id)
    answer = llm_sync.generate(body.message, session)
    usage_repo_sync.write(body.session_id, len(answer))
    return ChatResponse(answer=answer)

This may still work. FastAPI will run it in a worker thread. But now concurrency is bounded by threadpool size and thread memory. Long waits occupy real threads. That is usually worse for chat-scale workloads.

Simple, no? Same business logic. Different serving shape.

When plain `def` is the correct choice¶

Now what is the problem? People hear async preaching and convert everything blindly. Bad move. If a library is only synchronous, wrapping it inside async def does not make it magical. It can still block the event loop.

Picture this trap.

async route
   │
   ├── await async Redis          good
   ├── await async HTTP client    good
   └── pandas.read_csv(...)       bad, blocks event loop

If you must call a sync-only SDK, a plain def route may isolate the blocking better. FastAPI will move it to the threadpool. That protects the main kitchen lane.

Example. Suppose a vendor exposes only a blocking PDF parser. You are doing a small admin upload, not high-volume chat traffic. A plain def route can be acceptable. The tradeoff is clear. You use a thread. You keep the event loop free.

Another example is quick CPU-light transformations. If the route only validates a payload and computes a small hash, either style matters much. Readability and surrounding dependencies may decide.

When `async def` is still wrong even though it looks modern¶

Here is the dangerous anti-pattern.

import requests

@app.get("/bad")
async def bad() -> dict:
    response = requests.get("https://api.example.com")
    return response.json()

The handler is marked async. But requests.get blocks. So the event loop freezes during the call. That is worse than an honest sync route. At least a sync route goes to the threadpool.

So what to do? Use httpx.AsyncClient. Use async database drivers. Use async Redis clients. Or isolate the blocking work with run_in_threadpool when needed.

from fastapi.concurrency import run_in_threadpool

@app.get("/less-bad")
async def less_bad() -> dict:
    response = await run_in_threadpool(requests.get, "https://api.example.com")
    return response.json()

This is a bridge solution. Not a final architecture. But it saves the kitchen lane from freezing.

Decision table you should remember¶

Look at the comparison.

use async def when                 use def when
┌──────────────────────────┐       ┌──────────────────────────┐
│ async DB or HTTP client  │       │ sync-only SDK dominates  │
│ SSE or WebSocket route   │       │ legacy parser call       │
│ many concurrent waits    │       │ small admin action       │
│ streaming tokens         │       │ unavoidable thread work  │
└──────────────────────────┘       └──────────────────────────┘

And one more box.

neither is enough alone
┌──────────────────────────┐
│ CPU-heavy embedding job  │
│ OCR or video processing  │
│ large document indexing  │
└──────────────────────────┘

Those belong on the prep shelf. Not in the request path. That is the senior answer.

Also remember dependencies. A route can be async def while some dependencies are sync. FastAPI handles many of those cases in threadpools. Still, mixed stacks add overhead and surprise. Keep the mental model sharp.

Practical rule for AI teams.

If the user should wait and watch progress, prefer async def plus streaming. If the user should hand off work and return later, prefer a short route plus background queue. If a blocking step is unavoidable, contain it honestly. Do not hide it inside fake async code.

The front desk should choose the right lane early. The line cook should not discover mid-recipe that the oven locks the whole kitchen. See. Small route choices become big production behaviors.

Where this lives in the wild¶

OpenAI-compatible chat gateway — backend engineer: async def routes keep thousands of LLM waits alive while streaming partial tokens.
Enterprise admin upload tool — internal tools engineer: a sync-only malware scanner may justify a plain def route in a low-concurrency path.
Perplexity answer API — platform engineer: async routes fit multi-hop retrieval, cache lookup, and model calls that are all I/O heavy.
Document OCR service — ML platform engineer: CPU-heavy parsing should skip both route styles and move to queue workers instead.
Slack bot command service — backend engineer: short slash-command acknowledgements stay safe when unavoidable sync work is isolated from the main event loop.

Pause and recall¶

When is async def clearly the right choice for a FastAPI route?
Why can a plain def route be safer than fake async code?
What should you do with CPU-heavy work that lasts many seconds?
In the analogy, what mistake happens when the front desk sends a blocking job into the kitchen lane?

Interview Q&A¶

Q: Why use a plain def route instead of async def for a sync-only SDK call? A: Because FastAPI can isolate the blocking work in a threadpool, while a fake async wrapper may accidentally block the event loop and hurt all concurrent requests. Common wrong answer to avoid: "Because sync routes are always faster in Python."

Q: Why is async def preferred for LLM chat endpoints? A: Those endpoints mostly wait on network I/O, token streaming, and remote services, so cooperative yielding preserves concurrency and reduces queueing under load. Common wrong answer to avoid: "Because model APIs refuse sync requests."

Q: Why is run_in_threadpool only a bridge solution, not a final design? A: It protects the event loop, but it still consumes threads and hides a sync dependency that may limit scale or complicate cancellation. Common wrong answer to avoid: "Once wrapped in a threadpool, the sync library becomes fully async."

Q: Why is route style the wrong place to solve CPU-heavy indexing work? A: Because request handlers are for short-lived serving paths. Long CPU jobs belong in worker processes or queue systems, regardless of def versus async def. Common wrong answer to avoid: "Use async def and the CPU work will stop blocking."

Apply now (5 min)¶

Exercise. Take one planned endpoint. List its dependencies. Mark each as async-ready, sync-only, or CPU-heavy. Then choose async def, def, or queue handoff.

Sketch from memory. Draw the router split. Show one path going to the kitchen lane, and one path going to the threadpool. Write one sentence on why the choice matters.

Bridge. Good. We can now choose the right route shape. Next we open the pass window and stream AI output before the full answer is ready. → 05-streaming-responses.md

04. Async Endpoints — use async def where waiting matters¶