01. Event loop, workers, listeners — how nginx serves 10,000 connections per worker¶
~14 min read. One nginx worker process handles tens of thousands of concurrent connections. The trick is not threads; it is epoll, non-blocking sockets, and an event loop. This chapter opens the worker and traces one connection from accept to close.
Builds on: 00-eli5.md.
The receptionist analogy explains what nginx does. To debug a stuck connection, tune worker_connections, or explain why nginx beats a thread-per-request server at scale, you need to see how. We trace one HTTPS request — TLS handshake, header parse, proxy_pass to gunicorn, response stream — through the worker, observing every yield point.
1) The process model: master and workers¶
When you start nginx, you get two kinds of processes:
$ ps -ef | grep nginx
root 1234 1 0 ... nginx: master process /usr/sbin/nginx
nginx 1235 1234 0 ... nginx: worker process
nginx 1236 1234 0 ... nginx: worker process
nginx 1237 1234 0 ... nginx: worker process
nginx 1238 1234 0 ... nginx: worker process
The master process runs as root. Its job is to read the config, open listener sockets (which requires root for ports below 1024), and fork worker processes. Master also handles signals — SIGHUP for reload, SIGUSR1 for log rotation, SIGUSR2 for binary upgrade. Master does not serve requests.
Worker processes run as the unprivileged nginx user. Each worker inherits the listener sockets from master and serves requests. The number of workers is configured via worker_processes auto; — typically equals CPU count.
# nginx.conf
worker_processes auto;
worker_rlimit_nofile 65536;
events {
worker_connections 10240;
use epoll;
}
worker_connections caps per-worker concurrent connections. worker_rlimit_nofile raises the OS file-descriptor ceiling (each connection is an FD; you need slack for sockets, log files, upstream connections).
The total concurrent connections nginx can handle is roughly worker_processes × worker_connections / 2 (the / 2 accounts for upstream connections — a proxy_pass request uses one client FD and one upstream FD).
2) Listener sockets and the accept storm¶
The master opens the listener socket once. All workers accept() on the same socket.
master forks; each worker has FD pointing to the same listener socket.
master ─┐
├── worker 1 ── listener_fd = 7 (shared)
├── worker 2 ── listener_fd = 7 (shared)
├── worker 3 ── listener_fd = 7 (shared)
└── worker 4 ── listener_fd = 7 (shared)
When a new connection arrives, only one worker accepts it. Historically, multiple workers would wake up and race for the accept — the "thundering herd." Modern kernels (Linux 3.9+) support SO_REUSEPORT: each worker has its own socket bound to the same port, and the kernel load-balances accepts.
nginx supports both modes. reuseport is the modern default:
Without reuseport, nginx uses an accept_mutex to serialize accepts (older mechanism, sometimes still better on specific kernels).
3) The event loop — one worker, thousands of connections¶
Each worker runs an event loop. The loop:
while True:
events = epoll_wait() # block until any monitored FD has activity
for each ready event:
if listener is ready:
accept new connection, add to monitored set
if client socket is readable:
read whatever is available; process; if response ready, queue write
if client socket is writable:
write whatever is buffered; if more to send, keep monitoring
if upstream socket is readable:
read upstream response; queue write to client
if upstream socket is writable:
write request to upstream
if timer expired:
close idle connections, retry timeouts
epoll is the Linux mechanism that lets one syscall (epoll_wait) wait on thousands of file descriptors and return only those with activity. Other operating systems have equivalents: kqueue (BSD/macOS), event ports (Solaris). nginx auto-detects.
The worker is single-threaded. No locks. No GIL. No thread context switches. When one connection blocks (waiting on disk or network), the worker switches to another connection that is ready. This is cooperative concurrency at the syscall level.
TIME CONN A CONN B CONN C
──── ────────────── ────────────── ──────────────
0ms accept
1ms parse headers
2ms proxy_pass; wait accept
3ms waiting (upstream) parse headers
4ms waiting proxy_pass; wait accept
5ms upstream response waiting parse headers
6ms write to client waiting static file read
7ms close upstream response write to client
8ms write to client close
Three connections, served interleaved on one worker. Each pauses naturally on I/O; the worker fills the gap with another connection's work.
4) Non-blocking sockets¶
Every socket in nginx is non-blocking. A read() that would normally wait returns immediately with EAGAIN if no data is ready. The worker registers interest in the socket via epoll and moves on.
// pseudo-code
fd = accept(listener);
fcntl(fd, F_SETFL, O_NONBLOCK); // socket is non-blocking
n = read(fd, buf, sizeof(buf));
if (n < 0 && errno == EAGAIN) {
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, EPOLLIN); // wait for data
// worker moves to next ready event
}
The blocking call is epoll_wait — the single point where the worker waits. Everything else is non-blocking. This is the structural difference from a thread-per-request server, where each thread blocks individually.
5) The threaded example — one HTTPS request, end to end¶
GET https://api.example.com/orders/?status=paid → nginx → gunicorn → response.
T+0ms Worker 2 is in epoll_wait.
T+0.1ms TCP SYN arrives on listener_fd. Kernel completes handshake.
Worker 2 wakes, accepts connection, gets client_fd=42.
T+0.2ms Worker registers EPOLLIN on client_fd, returns to epoll_wait.
T+1ms TLS ClientHello arrives on fd=42. Worker wakes, processes
ClientHello, sends ServerHello + cert + key exchange.
T+1.2ms Worker registers EPOLLIN; returns to loop.
T+8ms (Network RTT to client.) Client sends ClientKeyExchange + Finished.
Worker processes TLS handshake, derives session keys, sends Finished.
T+8.3ms Encrypted application data flows.
T+9ms Client request body arrives. Worker decrypts.
"GET /orders/?status=paid HTTP/1.1\r\nHost: api.example.com\r\n..."
T+9.1ms Worker parses HTTP headers, identifies request as matching
location /api/ block (proxy to gunicorn upstream).
T+9.2ms Worker creates upstream connection to 127.0.0.1:8000.
connect() returns EINPROGRESS; worker registers EPOLLOUT on
upstream_fd. Returns to loop.
T+9.3ms Kernel completes the connect. Worker wakes on EPOLLOUT.
Writes the HTTP request to upstream_fd.
T+9.4ms Worker registers EPOLLIN on upstream_fd. Returns to loop.
T+50ms Gunicorn processes, responds. Bytes arrive on upstream_fd.
Worker wakes, reads, parses upstream headers, starts streaming
response back to client (encrypts via TLS, writes to client_fd).
T+52ms Last byte of response written and acknowledged.
T+52.1ms Worker checks Connection: keep-alive header; keeps client_fd
open for next request; closes upstream_fd; registers EPOLLIN
on client_fd; returns to loop.
T+52.2ms Worker 2 is back in epoll_wait, ready for the next event.
Total worker active time: ~10ms of actual CPU work spread across 52ms wall time. During the 40ms of waiting (TLS handshake RTT, gunicorn processing), the worker handled other connections. One worker easily handles hundreds of such requests per second.
6) Connection limits and the tunable knobs¶
worker_connections. Per-worker concurrent connections. 1024 is conservative; production deployments commonly use 4096-16384.
worker_rlimit_nofile. OS file descriptor limit per worker. Must be at least 2 × worker_connections because each proxy request uses two FDs (client + upstream). Set in nginx config; also requires raising the systemd or pam_limits ceiling.
keepalive_timeout. How long to hold an idle client connection open. Default 75s. Tune down for memory-bound deployments; tune up for low-RTT mobile clients that benefit from connection reuse.
keepalive_requests. Max requests per keepalive connection. Default 100; raise to 1000+ for high-throughput APIs.
upstream keepalive. Pool of warm connections to upstreams. Without this, every proxied request opens a new TCP connection to the upstream. Add keepalive 32; in the upstream block:
The 32 warm connections per worker avoid repeated three-way handshakes to upstream — a substantial latency saving.
7) Buffering — the slow-client defence¶
Imagine a mobile client on a flaky network uploading a 100 MB file at 50 KB/s. Without nginx in front:
One slow client occupies one worker for 33 minutes. A handful of slow clients exhausts the worker pool. This is the slowloris attack in its purest form.
nginx mitigates by buffering:
client_body_buffer_size 8k; # memory buffer per request
client_max_body_size 100m; # cap per request
client_body_temp_path /var/cache/nginx/body 1 2; # disk buffer
proxy_request_buffering on; # buffer entire request before sending
With proxy_request_buffering on, nginx receives the entire 100 MB upload (storing in memory up to client_body_buffer_size, then on disk) before opening the upstream connection. The upstream gunicorn worker sees a fast request from localhost — milliseconds, not minutes.
The cost: nginx uses disk and memory; the upstream sees no progress until upload completes (problematic for chunked-upload APIs). The benefit: gunicorn workers are protected from slow clients.
The same idea applies to slow upstream → fast client (proxy_buffering on).
8) The reload — graceful config changes without dropping connections¶
What happens:
1. Master receives SIGHUP.
2. Master parses new config; validates.
3. Master forks new worker processes.
4. New workers accept new connections.
5. Old workers stop accepting new connections but continue serving in-flight ones.
6. Old workers exit when their in-flight connections close (or when killed after timeout).
Zero downtime. Connections in flight at reload time finish on the old config; new connections see the new config. This is the operational property that makes nginx pleasant to operate.
There is one risk: the new config has a bug. Master validates syntax but not all semantics. If a new worker fails to start, master logs the error and keeps the old workers running. Always have nginx -t in CI to catch config syntax errors before deploy.
9) Logs — access and error¶
Two log files per server:
-
Access log (
/var/log/nginx/access.log). One line per request. Format is configurable:log_format main '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '$request_time $upstream_response_time'; access_log /var/log/nginx/access.log main;$request_timeis total nginx-perceived time;$upstream_response_timeis the time spent waiting on the upstream. Subtracting one from the other tells you nginx-side latency. -
Error log (
/var/log/nginx/error.log). Errors, warnings, debug if configured.
For production, log to stdout/stderr and let the orchestrator (Docker, Kubernetes, journald) aggregate. JSON log format is convenient for log aggregators (Elastic, Loki):
log_format json escape=json
'{"time":"$time_iso8601","method":"$request_method",'
'"path":"$uri","status":$status,"rt":$request_time,'
'"urt":"$upstream_response_time","client":"$remote_addr"}';
Tail or grep are still the fastest tools when something is on fire.
10) When the model breaks — patterns that don't fit nginx well¶
Three workloads where the event-loop model strains:
Synchronous CPU work. nginx is single-threaded per worker. CPU-bound logic (heavy regex, complex Lua) blocks the entire worker's event loop. Mitigation: offload CPU work to an upstream service. Don't write business logic in nginx.
Very large file serving. Even with sendfile, serving a 10 GB file holds resources. For huge files, use a CDN or object storage with signed URLs.
Long-lived streaming with backpressure. Server-sent events and WebSockets work, but the connection holds a slot. With thousands of long-lived WebSockets per worker, you can saturate worker_connections before saturating CPU. Tune worker_connections up; consider a dedicated upstream (daphne, uWSGI with WebSocket support, or Go services) and proxy.
Operational signals¶
Healthy. nginx -s reload succeeds; workers handle target RPS at < 30% CPU; worker_connections utilisation < 50%; access log shows expected status distribution.
First degrading metric. worker_connections utilisation climbing past 70%. Either traffic has grown or upstream is slow (connections held longer waiting for upstream).
Misleading metric. Per-request CPU time. Most of nginx's work is I/O wait; CPU is rarely the constraint.
Expert graph. $request_time - $upstream_response_time distribution — the slice of time nginx itself took. Should be near-zero (microseconds) under healthy conditions.
Where this appears in production¶
- Cloudflare — nginx (with custom modules) as the edge front-end for millions of sites.
- Netflix — nginx for many edge use cases; their tuning patterns on
worker_connectionsand keepalive are public. - Discord — nginx terminating TLS for WebSocket fan-out at scale.
- GitHub — nginx in front of Rails; the patterns around
proxy_bufferingfor large uploads are well-documented. - Wikipedia / Wikimedia — nginx as the cache layer (Varnish elsewhere, nginx + Lua in some paths).
- Dropbox — nginx at the front of file transfer paths; tuned for high-throughput.
- A Bengaluru SaaS — nginx + uvicorn for async Django;
proxy_buffering offfor streaming endpoints. - A Mumbai fintech — nginx as the TLS and rate-limit edge for the payments API.
Recall / checkpoint¶
- What is the role of the nginx master vs. worker processes?
- How does one nginx worker handle thousands of concurrent connections?
- What is
reuseportand what problem does it solve? - Why is
proxy_bufferinga defence against slow clients? - What does
nginx -s reloadactually do? - What is the difference between
$request_timeand$upstream_response_time? - Why is CPU-heavy work in nginx an anti-pattern?
Interview Q&A¶
Q1. nginx is handling 10K concurrent connections per worker on a 4-CPU box. Walk through how this is possible. Each worker runs an event loop on top of epoll (Linux) or kqueue (BSD). Sockets are non-blocking; the worker waits on epoll for any monitored FD to become ready, then handles whatever is ready (accept, read, write, timer) and returns to epoll. One thread, one process, thousands of connections — no thread stacks, no context switches, just FD-level cooperation. Compare to thread-per-request: 10K threads × 8 MB stack = 80 GB memory, plus context-switch storms. nginx's model trades that for one worker, ~10 MB resident, doing the same work via cooperative scheduling. Common wrong answer to avoid: "nginx uses async threads" — it uses event-loop concurrency in a single thread per worker.
Q2. The team's gunicorn workers are getting tied up by slow uploads. How does nginx help?
Enable proxy_request_buffering on (default) so nginx receives the entire request body into memory or disk before opening the upstream connection. The upstream gunicorn worker then sees a fast request from localhost and processes it in milliseconds. Without this, a slow client on mobile holds a gunicorn worker for the duration of the upload. nginx's buffering layer absorbs the slow-network cost; gunicorn workers stay free. Tune client_body_buffer_size and client_body_temp_path to size memory and disk usage. Common wrong answer to avoid: "scale gunicorn workers" — band-aid; nginx buffering is structural.
Q3. After a reload, some requests return 502s for ~30 seconds. What is the diagnosis?
Likely the old workers were killed before in-flight connections completed; clients on those connections see the upstream connection drop, nginx returns 502. Fix: nginx -s reload should let old workers finish in-flight requests; if they are killed (e.g., orchestrator kills the container immediately), in-flight requests fail. Configure the orchestrator's terminationGracePeriodSeconds to be longer than the longest expected in-flight request. Verify nginx -s quit (not nginx -s stop) is used for shutdown to drain connections. Common wrong answer to avoid: "nginx reload always loses requests" — properly handled, it does not.
Q4. You see worker_connections utilisation at 90%. What do you investigate first?
The upstream's response time. High worker_connections utilisation often means upstream is slow — nginx workers are holding connections open waiting for gunicorn (or whatever upstream) to respond. Check $upstream_response_time distribution; if elevated, the upstream is the constraint, not nginx. If $upstream_response_time is fine, then traffic has actually grown — raise worker_connections and worker_rlimit_nofile, or add more nginx pods. Common wrong answer to avoid: "raise worker_connections immediately" — without diagnosis, you raise a ceiling that wasn't the real constraint.
Q5. The team's nginx config has a bug. After nginx -s reload, nginx serves errors. What is the safest deployment pattern?
Always run nginx -t in CI before deploying. The -t flag does a syntax check without applying. In CI, validate that every commit's config passes nginx -t against the runtime nginx version. For deployment: blue/green nginx instances behind a load balancer, so a bad config affects only the new instance and traffic can shift back. For single-instance deployments: a wrapper script that runs nginx -t first, only then nginx -s reload. Common wrong answer to avoid: "trust the reload" — reload is graceful for valid configs; for invalid ones, you need a pre-check.
Q6. How do you scale a long-lived WebSocket workload through nginx?
Three patterns. First, tune worker_connections up — each WebSocket is one connection slot; default 1024 is too small. Second, use a dedicated upstream (daphne, uvicorn) for WebSockets; HTTP and WebSocket workloads have different sizing. Third, consider whether nginx is the right edge for WebSockets at scale — some teams use Envoy or HAProxy for very-high-connection WebSocket workloads where they need finer control. For typical scales (thousands to tens of thousands of WebSockets), nginx with tuned worker_connections is sufficient. Common wrong answer to avoid: "WebSockets work the same as HTTP" — connection lifetime is the difference; sizing is the consequence.
Operational memory¶
This chapter explained the nginx process model and event loop: one master, N workers (one per CPU), each worker running an epoll-based event loop that serves thousands of concurrent connections. The important idea is that nginx's scale comes from cooperative concurrency on non-blocking sockets, not from threads — and this changes how you size, tune, and debug it.
You learned to recognise the master/worker split, follow a request through the event loop, size worker_connections and worker_rlimit_nofile, deploy buffering as the slow-client defence, and read $request_time vs. $upstream_response_time to isolate the constraint. That solves the opening problem because nginx's behaviour at scale is now legible.
Carry this diagnostic forward: when nginx is the suspect in production, ask which layer is constrained — workers, connections, upstream wait, or CPU. Each has a structural fix.
Remember:
- One master, N workers; each worker is a single-threaded event loop.
epoll/kqueueis the syscall;EAGAINis the pattern.worker_connections×worker_processesis the concurrency ceiling.- Buffering protects upstreams from slow clients; the inverse protects clients from slow upstreams.
nginx -s reloadis graceful for valid configs; alwaysnginx -tfirst.- Single-thread per worker means CPU-heavy work is an anti-pattern.
Bridge. The engine is understood. Day-to-day, you write configs — server blocks, location blocks, proxy_pass directives, headers. The next chapter is the configuration surface and the patterns that survive. → 02-configs-locations-day-to-day.md