02. Event Loop and Coroutines — the traffic police of waiting work¶

~14 min read. Async makes sense only when you can picture who pauses, who resumes, and why.

Built on the ELI5 in 00-eli5.md. The kitchen lane — the scheduler path — decides which order ticket gets attention next.

First picture: one lane, many paused tickets¶

Look at the shape first. Do not start with jargon. The event loop is a coordinator. It is not doing the network call itself. It is deciding who runs now.

                 ready queue
                      │
                      ▼
┌────────────────────────────────────┐
│           kitchen lane             │
├───────────────┬────────────────────┤
│ run ticket A  │ hits await         │
│ run ticket B  │ hits await         │
│ run ticket C  │ finishes response  │
└───────┬───────┴───────────┬────────┘
        ▼                   ▼
  waiting on DB       waiting on LLM
        │                   │
        └──── event ready ──┘
                 │
                 ▼
            back to queue

See. A coroutine is an async def function. When called, it does not run fully at once. It creates a coroutine object. That object is like an order ticket not yet cooked. The event loop later drives it.

The keyword await is the turning point. At await, the coroutine says, "I cannot continue until this result comes back." So the line cook steps aside. The kitchen lane picks another ticket. That is cooperative multitasking.

Coroutines are plans; tasks are scheduled plans¶

A common beginner error is important here. Creating a coroutine is not the same as running it. See this example.

async def fetch_profile(user_id: str) -> dict:
    return {"user_id": user_id}

coro = fetch_profile("u_123")
print(coro)

This prints a coroutine object. No work happened yet. To run it, we need an event loop. Usually we await it. Or we wrap it as a task.

profile = await fetch_profile("u_123")

Now look at tasks. A task is a scheduled coroutine. The loop knows about it. It can run it, pause it, and resume it.

task = asyncio.create_task(fetch_profile("u_123"))
result = await task

Why care about this distinction? Because senior bugs often come from forgotten tasks. A coroutine object created and never awaited does nothing. A task created and never tracked may fail silently. Simple, no?

Worked example: two chats, one slow model call¶

Let us trace concrete steps. We receive two chat requests. Request A needs a Redis session lookup and then an LLM call. Request B only needs a quick cache hit.

time →

A: parse ─→ await Redis ─→ await LLM ───────────────→ finish
B:            parse ─→ cache hit ─→ finish

loop: run A ─→ run B ─→ resume A when Redis ready ─→ resume A when LLM ready

Now step by step. At t=0, the front desk creates two order tickets. The kitchen lane starts A. A reaches await redis.get(...) quickly. So A pauses.

The loop now runs B. B checks memory cache. No waiting occurs. B returns immediately. User B gets a fast reply.

Later Redis finishes for A. The loop marks A ready again. A resumes. Then A reaches await llm_client.responses.create(...). A pauses again. Other tickets can run. Later the model response arrives. A resumes and completes.

See what happened. One request waited a long time. That long wait did not imprison the whole service. That is the event loop's real job.

`await` means yield, not magic parallelism¶

Now what is the problem? Many people think await creates background parallel work automatically. Not exactly. await means, "pause here until this awaitable completes." If you await two things one after another, they happen sequentially.

first = await fetch_user()
second = await fetch_usage()

If both are independent, we may want concurrency. Then we create tasks or use gather.

user_task = asyncio.create_task(fetch_user())
usage_task = asyncio.create_task(fetch_usage())
user, usage = await asyncio.gather(user_task, usage_task)

Picture the difference.

sequential waits                    concurrent waits
┌───────────────┐                  ┌───────────────┐
│ await user    │                  │ start user    │
├───────────────┤                  ├───────────────┤
│ await usage   │                  │ start usage   │
└───────────────┘                  ├───────────────┤
                                   │ await both    │
                                   └───────────────┘

So what to do? Ask one question. Are these waits dependent? If yes, sequence them. If no, schedule both. The kitchen lane can juggle multiple waiting tickets only when we hand it multiple tickets.

Common mistakes that break the model in your head¶

Mistake one. Forgetting await. Then you return a coroutine object or never execute the work. That bug is subtle. Python will often warn later.

Mistake two. Calling blocking code inside async def. The file looks async. Reality is not. A requests.get() call still blocks. The line cook is trapped again.

Mistake three. Fire-and-forget tasks without ownership. You start work with create_task. Then the request ends. If the task crashes, where is the error handled? Who cancels it on shutdown? Who records the result? Look. Unowned tasks are kitchen fires.

Mistake four. Thinking one event loop means one request at a time. No. One loop can interleave many waiting coroutines. The limitation is CPU, fairness, and blocking behavior. Not request count alone.

The mental model to keep forever.

A coroutine is resumable work. A task is scheduled resumable work. The event loop is the dispatcher. await is the yield point. That is enough to reason well.

Keep saying it like the restaurant. The order ticket carries the plan. The line cook does a little work. The kitchen lane notices a wait and moves another ticket. When an ingredient arrives, the ticket comes back. Simple, no?

That model will support everything later. FastAPI routes. Streaming generators. Cancellation. Background jobs. WebSockets. All rest on this loop-and-yield picture.

Where this lives in the wild¶

Anthropic API gateway — backend engineer: one event loop keeps many streaming conversations alive while each waits on model tokens.
Perplexity retrieval service — search engineer: coroutines interleave web fetches, rerank calls, and cache reads inside one request.
GitHub Copilot backend — platform engineer: editor suggestions depend on many short waits that must resume cleanly without blocking sibling requests.
Scale AI data platform — infrastructure engineer: task scheduling matters when model calls, database writes, and audit logs all interleave.
Zapier AI actions service — integrations engineer: thousands of webhook waits are easier to manage with coroutine scheduling than thread-per-connection designs.

Pause and recall¶

What is the difference between a coroutine object and a scheduled task?
What does await actually signal to the event loop?
Why can await a(); await b() still be sequential and slow?
In the kitchen analogy, who corresponds to the event loop and what does it decide?

Interview Q&A¶

Q: Why use asyncio.create_task plus gather instead of plain sequential await calls? A: Because independent waits can overlap only if they are scheduled concurrently. Sequential await keeps latency additive even when the underlying resources are unrelated. Common wrong answer to avoid: "Because gather always uses multiple CPU cores."

Q: Why is a coroutine object not enough to do work? A: A coroutine object is only resumable state. It must be awaited or wrapped in a task so the event loop actually drives execution. Common wrong answer to avoid: "Calling an async def function immediately runs it in the background."

Q: Why is fire-and-forget task creation risky in API servers? A: Because lifecycle, error handling, cancellation, and result ownership become unclear, especially during disconnects and shutdown. Common wrong answer to avoid: "It is safe because the loop will clean up every task automatically."

Q: Why does await improve concurrency without promising parallel CPU execution? A: It yields control during waits, letting other coroutines progress, but all that work may still run on the same core unless you explicitly offload compute elsewhere. Common wrong answer to avoid: "await means Python runs both functions at the same exact time."

Apply now (5 min)¶

Exercise. Write a tiny async script with two coroutines. One should await asyncio.sleep(1). The other should return immediately. Predict the print order before running it.

Sketch from memory. Draw the kitchen lane, two waiting boxes, and one ready queue. Label where an order ticket pauses and resumes.

Bridge. We now know how the scheduler thinks. Next we need the web framework that turns HTTP requests into those tickets: FastAPI. → 03-fastapi-basics.md