07. Load Balancing at Network Level — one smart sorting center, many backend branches¶

~16 min read. If traffic enters randomly, one hot server spoils the whole system.

Built on the ELI5 in 00-eli5.md. The post office — the sorting center that reads destinations and distributes letters — now becomes the network gatekeeper for healthy traffic spread.

1) Why the post office exists in front of servers¶

Users should not pick backend machines directly. That would leak topology and create hot spots. A load balancer sits in front. It accepts traffic on one address. Then it chooses one backend target. Think of it as a post office sorting center. Letters arrive at one counter. The sorter sends them to proper branches. Same idea here. ┌──────────┐ one VIP ┌────────────────┐ │ Clients │ ─────────────────→ │ Load balancer │ └──────────┘ └──────┬─────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌────────┐ ┌────────┐ ┌────────┐ │ App-1 │ │ App-2 │ │ App-3 │ └────────┘ └────────┘ └────────┘ The client sees one stable entry point. Backend fleet can grow or shrink behind it. Health checks can remove broken machines quietly. Draining can stop new connections before shutdown. See how much indirection buys you.

2) L4 and L7 do different kinds of sorting¶

Layer 4 means transport-level balancing. The balancer looks at IPs, ports, and TCP state. It does not care about URL path or header names. Layer 7 means application-aware balancing. The balancer can inspect host, path, method, cookies, or headers. One quick picture. ┌────────────────────┬─────────────────────────────┐ │ L4 balancer │ L7 balancer │ ├────────────────────┼─────────────────────────────┤ │ sees IP and port │ sees HTTP meaning │ │ faster decisions │ richer routing rules │ │ protocol agnostic │ HTTP and gRPC aware │ │ lower overhead │ more policy controls │ └────────────────────┴─────────────────────────────┘ Worked example. Suppose 203.0.113.10:443 receives 60,000 new connections each second. You have 4 identical TCP backends. L4 round-robin gives about 15,000 new connections each second per node. Nice and simple. Now change the requirement. /payments must go to PCI-isolated servers. /search must go to search fleet. Same hostname. Same port 443. L4 cannot distinguish path. L7 can. So what is the rule? Use L4 when backends are functionally identical. Use L7 when request meaning changes the destination. HAProxy and NGINX can do both styles. Cloud load balancers often expose separate L4 and L7 products too.

3) DNS load balancing and anycast spread traffic earlier¶

Sometimes balancing happens before the TCP session even starts. DNS can return different addresses for the same hostname. That is DNS load balancing. Example: - api.example.in can return 34.10.1.8 - or 34.10.2.8 - or 34.10.3.8 Resolver choice and TTL affect who gets what. Worked TTL example. Suppose TTL is 60 seconds. A client population of 120,000 users resolves evenly. Each address may receive about 40,000 users during that minute. Now one region fails at second 10. DNS cannot instantly recall the old answer from every cache. Some users may keep trying dead address until TTL expires. That is why DNS balancing is coarse-grained. It is useful. It is not fine-grained per connection. Anycast is different. Multiple locations advertise the same IP prefix. Internet routing sends each user toward nearest healthy path. Think of many post office counters sharing one postal code. Nearest reachable counter receives the customer. Diagram time. ┌──────────┐ same anycast IP ┌───────────────┐ │ Delhi │ ────────────────────────────→ │ Mumbai PoP │ ├──────────┤ ├───────────────┤ │ Chennai │ ────────────────────────────→ │ Chennai PoP │ ├──────────┤ ├───────────────┤ │ Dubai │ ────────────────────────────→ │ Dubai PoP │ └──────────┘ └───────────────┘ Same address. Different physical landing point. Anycast shines for global edge networks and DDoS absorption. But it still needs local balancing after traffic lands.

4) Health checks decide who stays in rotation¶

A balancer is only smart if it knows backend health. Active checks send probes. Passive checks observe real failures. Good checks test more than process aliveness. A backend can accept TCP and still be useless. Database may be down. Thread pool may be full. Dependency latency may be exploding. Suppose 5 backends share 50,000 requests per second. Normal average is 10,000 requests per second each. One node starts failing health checks. Remove it. New average becomes 50,000 ÷ 4 = 12,500 requests per second. That is 25% more load per remaining node. See the hidden question. Can the survivors absorb that jump safely? NGINX and HAProxy both support health-aware routing. They can stop sending fresh traffic to unhealthy nodes. They can also mark nodes as backup only. Here is the operational picture. ┌──────────────┐ probe ┌──────────┐ │ HAProxy │ ─────────────→ │ App-2 │ └──────┬───────┘ └──────────┘ │ health fail ▼ remove from pool That one action saves many user errors.

5) Connection draining prevents rude shutdowns¶

Shutdown is not only immediate termination and prayer. Long-lived connections make that dangerous. Connection draining means this. - stop sending new connections to server X - keep existing connections alive for some grace period - close server only after active sessions finish or timeout Worked example. Backend App-4 has 2,400 active keep-alive sessions. Average completion rate is 80 sessions per second. Expected drain time = 2,400 ÷ 80 = 30 seconds. Add safety buffer of 15 seconds. Drain window becomes 45 seconds. If you stop it immediately, all 2,400 sessions may see resets or retries. If you drain correctly, most users feel nothing. This matters even more for WebSocket and gRPC streams. Their sessions can live for minutes or hours. So a balancer must separate two ideas. Healthy for new traffic is one question. Alive enough to finish old traffic is another question.

6) Choosing the right balancing layer¶

Keep this checklist in your head. Choose DNS balancing when you need region-level spread. Choose anycast when you need nearest global entry point. Choose L4 when traffic is generic transport and backend pool is homogeneous. Choose L7 when routing depends on path, host, or richer policies. Use HAProxy or NGINX when you need explicit routing control on your fleet. Use health checks so broken nodes leave fast. Use draining so maintenance does not create avoidable resets. One compact summary. ┌─────────────┬──────────────────────────────┐ │ Mechanism │ Best use │ ├─────────────┼──────────────────────────────┤ │ DNS │ coarse regional distribution │ │ Anycast │ nearest global entry │ │ L4 │ fast transport spread │ │ L7 │ content-aware routing │ └─────────────┴──────────────────────────────┘ Good systems often combine several. DNS may choose region. Anycast may choose edge city. L7 may choose service. Internal L4 may choose exact instance. That layered post office design is normal.

Where this lives in the wild¶

Cloudflare — network engineer uses anycast to land traffic at nearby edge locations before local balancing happens.
AWS — site reliability engineer chooses between Network Load Balancer and Application Load Balancer based on transport versus HTTP-aware routing needs.
HAProxy at large fintechs like Razorpay — platform engineer manages health checks, draining, and weighted rollouts for payment APIs.
NGINX in ecommerce platforms like Flipkart — traffic engineer routes /search and /checkout differently while keeping one public hostname.
Netflix — edge reliability engineer layers DNS, regional routing, and local balancers so failures do not overwhelm one zone.

Pause and recall¶

Why is DNS load balancing useful but coarse?
What can L7 see that L4 cannot see?
Why does removing one unhealthy node change capacity math immediately?
What problem does connection draining solve during deployments?

Interview Q&A¶

Q: Why choose L4 load balancing instead of L7 for some services? A: L4 is simpler and faster when all backends are functionally identical and routing depends only on transport endpoints. You avoid unnecessary application inspection overhead. Common wrong answer to avoid: "Because L4 is always better" — it is better only when you do not need content-aware routing or richer HTTP policies. Q: Why is DNS balancing not enough for failure handling by itself? A: DNS answers get cached, so clients may keep using dead addresses until TTL expires. That makes DNS good for coarse spread, not fine real-time steering. Common wrong answer to avoid: "Because DNS is slow" — the main issue is cached stale answers, not raw lookup speed. Q: Why use connection draining during a rolling deploy? A: It stops new sessions first, then lets existing ones finish within a grace window. That prevents avoidable connection resets and retry storms. Common wrong answer to avoid: "Because users will reconnect anyway" — forced reconnects can break streams, duplicate writes, and create latency spikes. Q: Why are health checks and capacity planning linked tightly? A: Once a bad node leaves the pool, the remaining nodes must absorb more traffic instantly. A fleet that detects failure but cannot survive redistribution is still fragile. Common wrong answer to avoid: "Health checks alone guarantee availability" — detection helps only if headroom exists afterward.

Apply now (5 min)¶

Exercise: Assume 6 servers share 72,000 requests per second. Compute average load per server. Then remove one failed server and recompute the new average. Next, assume the removed server had 1,800 active connections. If 90 finish each second, estimate drain time. Finally, write whether you would use DNS, L4, or L7 for /payments and /search on one hostname. Sketch from memory: Draw one public address, one load balancer, and three backends. Label one health check, one drain step, and one routing rule. Then add a side note for DNS and anycast before the balancer.

Bridge. Traffic is now reaching the right machines. Next we ask how to move content physically closer to users before requests even travel so far. → 08-cdn-and-edge.md