10. WebSockets and Bidirectional Channels — when both sides need to talk¶
~15 min read. SSE is great for one-way streams, but chat collaboration sometimes needs a live two-way socket.
Built on the ELI5 in 00-eli5.md. The pass window — once enough for one-way plates — now grows into a two-way counter where both the guest and line cook can speak.
First picture: SSE speaks one way, WebSockets speak both ways¶
Look at the contrast first. SSE sends updates from server to client. WebSockets keep a persistent channel where both sides send messages.
That difference changes product design. Typing indicators, live interruption, multi-user cursors, and collaborative agent control all fit better with WebSockets. Plain token streaming often does not need them. Simple, no?
A tiny FastAPI WebSocket endpoint¶
Picture the lifecycle first. Connect. Accept. Receive messages. Send messages. Handle disconnect.
client connects
│
▼
accept socket
│
├── receive message
├── send reply
├── repeat
└── disconnect cleanup
Now the code.
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
app = FastAPI()
@app.websocket("/ws/chat")
async def chat_socket(websocket: WebSocket):
await websocket.accept()
try:
while True:
message = await websocket.receive_text()
await websocket.send_text(f"echo: {message}")
except WebSocketDisconnect:
pass
See. This is the simplest loop. The socket stays open. Each incoming message becomes a new order ticket inside the same connection context. When disconnect happens, we clean up.
Why chat UIs sometimes need WebSockets¶
Now what is the problem with plain request-response? Some features require live client input mid-session. The user may interrupt generation. The UI may send cursor position or tool approval. The server may push tool status back immediately.
Picture a richer chat session.
browser
├── send user message
├── send interrupt signal
├── send typing state
└── receive tokens and tool events
This fits a persistent socket better than repeated polling. The pass window becomes interactive. Not just one-way plating. Both sides can react.
Worked example.
Suppose an agent starts browsing sources.
The server sends tool_started.
Then partial notes.
Then token deltas.
The user clicks stop.
The browser sends cancel instantly.
The server triggers the cancel bell.
That is a natural WebSocket flow.
Connection management is the real work¶
The endpoint code above is easy. Production socket management is the hard part. You need connection registries, authentication, heartbeats, backpressure handling, and per-user cleanup.
connection manager
│
├── map user_id → open sockets
├── auth on connect
├── broadcast or send-to-one
└── remove on disconnect
A common pattern is a connection manager object. It stores active sockets. It knows how to send to one client, a room, or a tenant. It removes dead sockets safely.
Now what is the problem? If the client stops reading, your sends can pile up. That is backpressure. If you ignore it, memory grows. So what to do? Bound outgoing queues. Drop stale updates when acceptable. Prefer small event payloads. Design for slow consumers.
When WebSockets are overkill¶
Do not use WebSockets because they sound modern. They add operational complexity. Load balancers, sticky sessions, connection limits, auth refresh, and mobile reconnect behavior all matter.
If your product only needs server-to-client tokens, SSE is usually simpler. If the client sends one message, then passively watches output, SSE wins on simplicity. Use the right tool.
The front desk should not open a permanent private line for every tiny order. Only do it when the conversation truly needs two-way continuity. See. That is the senior answer.
Where this lives in the wild¶
- ChatGPT voice and multimodal sessions — realtime engineer: bidirectional channels let user interrupts and model events flow continuously.
- Figma-style AI assistant — collaboration engineer: live cursor context and assistant hints fit persistent sockets better than repeated HTTP polling.
- Customer support supervisor console — full-stack engineer: agents can interrupt drafts, send approvals, and receive live tool status through one socket.
- Gaming NPC chat backend — realtime engineer: low-latency player utterances and streaming responses benefit from two-way channels.
- Internal agent orchestration dashboard — platform engineer: long tool runs can push progress while operators send control messages back instantly.
Pause and recall¶
- What capability does WebSocket add that SSE does not?
- Why is connection management harder than the basic endpoint loop?
- When is SSE still the better choice for AI products?
- In the analogy, what changed when the pass window became two-way?
Interview Q&A¶
Q: Why choose WebSockets instead of SSE for an interruptible agent UI? A: Because the client must send control messages like cancel, approve, or typing state over the same live channel while still receiving streamed server updates. Common wrong answer to avoid: "Because WebSockets always have lower latency than SSE."
Q: Why is connection management the real production challenge for WebSockets? A: Long-lived sockets require lifecycle tracking, auth, cleanup, backpressure handling, and scaling behavior that simple HTTP endpoints largely avoid. Common wrong answer to avoid: "Once the endpoint accepts the socket, the hard part is done."
Q: Why can WebSockets be the wrong choice for plain token streaming? A: If the client only needs one-way updates, SSE gives simpler infrastructure, easier browser integration, and fewer persistent connection concerns. Common wrong answer to avoid: "WebSockets are always more professional, so use them by default."
Q: Why does backpressure matter in socket systems? A: Slow readers can cause queued outgoing messages and memory growth unless you bound or shed work deliberately. Common wrong answer to avoid: "The network stack automatically protects your app from slow consumers."
Apply now (5 min)¶
Exercise. Pick one chat feature. Decide whether it needs SSE or WebSocket. Justify the choice with one sentence about direction of communication. Then list one new operational risk if you choose WebSocket.
Sketch from memory. Draw browser ◀──→ server.
Add one cancel message from browser,
and one token event from server.
Mark where the cancel bell fires.
Bridge. We now know how async APIs behave in production. Next we need proof that they behave correctly, which means testing async code, streams, and mocks. → 11-testing-async.md