Skip to content

12. Rate Limiting at Network Layer — when the post office must say enough

~18 min read. Protect the sorting room before perfecting application elegance.

Built on the ELI5 in 00-eli5.md. The post office — refusing more letters than it can sort — explains network-layer rate limiting.


1) See the problem before the countermeasure

A healthy service expects bursts, but not infinite bursts. Attack traffic abuses that difference. Sometimes it is a deliberate DDoS campaign. Sometimes it is just a buggy client retry loop. At network level, both can crush capacity first. CPU is not the only bottleneck. Connection tables fill. Bandwidth saturates. Load balancer queues lengthen. Kernel backlog limits get hit. Then even good users start looking malicious by accident. That is why rate limiting begins below business logic. Imagine a sorting center. If ten thousand letters arrive each second, workers cope. If two million arrive each second, belts jam everywhere. The post office must refuse excess before the floor collapses.

Incoming traffic
┌──────────────┐
│ Edge filter  │
└──────┬───────┘
┌──────────────┐
│ Rate limiter │
└──────┬───────┘
┌──────────────┐
│ App servers  │
└──────────────┘

Rate limiting is not one technique. It is a family of techniques at multiple layers. You can limit packets per second. You can limit new TCP connections per second. You can limit requests per IP or per token. You can shape bandwidth instead of dropping immediately. Different attacks need different doors closed.

2) Connection limits and SYN flood protection

A SYN flood abuses the TCP handshake. Attackers send many SYN packets and never complete the handshake. The server reserves half-open state for each attempted connection. Eventually that table fills up. Legitimate new clients then wait or fail. Classic protection starts with SYN cookies and backlog tuning. SYN cookies avoid storing state too early. The server encodes information inside the SYN-ACK response itself. Only when the final ACK returns does state become committed. That is wonderfully practical. Edge devices can also limit new connections per source. For example, cap one IP at 200 new connections per second. Cap one subnet at 5,000 if traffic patterns justify it. Worked example. Your service normally sees 800 new connections per second. A sudden spike reaches 25,000 per second from 400 IPs. That is about 62 new connections per IP each second. If one bad actor sends 5,000 alone, per-IP limits catch it fast. If thousands of bots each send 50, you need broader edge defenses too. See the handshake picture.

Attacker                 Server
  │ SYN ───────────────▶ │
  │ SYN ───────────────▶ │
  │ SYN ───────────────▶ │
  │ ...no final ACK...   │

Now the protected version.

Client                   Protected server
  │ SYN ───────────────▶ │
  │ ◀──── SYN-ACK(cookie)│
  │ ACK ───────────────▶ │
  │ state allocated now  │

Remember one subtlety. Connection limits can hurt mobile users behind carrier NATs. Many real users may appear from one source IP. So limits must match traffic reality, not fear alone.

3) Traffic shaping, token buckets, and fair usage

Dropping everything above a threshold is crude. Sometimes you want smoothing instead of immediate refusal. Traffic shaping deliberately slows packets to a configured rate. This protects downstream links and keeps latency less chaotic. Token bucket is a common mental model. Tokens refill steadily over time. Each request or packet spends tokens. Short bursts are allowed if saved tokens exist. Long abusive floods exhaust the bucket and get throttled. Worked example with concrete numbers. Suppose a bucket holds 1,000 tokens. It refills at 200 tokens per second. A client sends 900 requests instantly. That burst passes because tokens were saved. Then the same client sends 500 each second continuously. Only 200 per second keep passing after the bucket empties. That is graceful control, not blind panic. Traffic shaping also matters for internal networks. Backup jobs can starve customer traffic on shared links. Shape backups to 200 Mbps and preserve APIs at 800 Mbps. Now everybody breathes. The envelope still moves, just more fairly. A tiny diagram helps.

Tokens: 1000 max
Refill: 200/sec
Burst 1: 900  ── allowed
Burst 2: 500  ── 200 allowed, 300 delayed/dropped

Policy must be measurable. Choose limits from observed percentiles, not random round numbers. If p99 normal traffic is 120 rps per client, start above that. Then watch error rates and user impact carefully.

4) WAFs and managed edge protection

Some attacks are volumetric. Some attacks are protocol abuse. Some attacks are application-shaped, like malicious HTTP floods. A WAF sits at the HTTP layer and applies smarter rules. It can block patterns, countries, headers, bots, and known signatures. That does not replace lower-layer protection. It complements it. Cloudflare, AWS Shield, and similar services absorb huge bad floods upstream. They use massive anycast capacity and threat intelligence. That means bad traffic gets filtered before your origin link saturates. AWS Shield Advanced also integrates with Route 53, CloudFront, and ELB. Cloudflare can challenge browsers, rate limit paths, and hide origin IPs. Worked example. Suppose your origin can absorb 3 Gbps safely. An attack sends 40 Gbps of mixed junk traffic. If edge protection drops 95% before origin, only 2 Gbps reaches you. Now your origin survives. But remember fairness. A blunt WAF rule can block legitimate crawlers or partners. So always keep exception paths and observability. Look at the layered defense.

Internet flood
┌──────────────┐
│ CDN / Anycast│
└──────┬───────┘
┌──────────────┐
│ DDoS shield  │
└──────┬───────┘
┌──────────────┐
│ WAF rules    │
└──────┬───────┘
┌──────────────┐
│ Origin app   │
└──────────────┘

5) Design rate limits like an adult, not a slogan

Rate limiting should protect availability without punishing normal growth. Use separate limits for separate resources. New connections deserve one threshold. Requests per API key deserve another threshold. Bandwidth per tenant may need another threshold. Login endpoints deserve stricter controls than read-only catalog pages. Admin paths deserve stricter controls than public assets. Always measure at least these four signals. Dropped connections per second. Accepted connections per second. HTTP 429 or challenge rates. Latency and error rates for good users. Worked example with a launch event. A match-ticket release expects 50,000 users in five minutes. That is about 167 users arriving each second on average. Real bursts may be three times higher. So design maybe for 500 good arrivals each second. Then rate limit unknown clients above that plus safe headroom. If real traffic comes from one ISP NAT block, adjust quickly. Blind limits create self-inflicted outages. Good limits are adaptive, monitored, and reviewed after every incident.


Where this lives in the wild

  • Network security engineer at Cloudflare: tunes edge rate limits and bot challenges during volumetric attacks.
  • SRE at AWS: relies on Shield, ELB scaling, and SYN protection for public-facing workloads.
  • Platform engineer at BookMyShow: shapes ticket-launch traffic so payment and inventory systems stay responsive.
  • Infra lead at Hotstar: protects streaming origins with layered WAF rules during sports-event surges.
  • Security operations analyst at Akamai: monitors abusive connection patterns and adjusts mitigation profiles in real time.

Pause and recall

  1. Why is SYN flood protection different from API request throttling?
  2. When is traffic shaping better than immediate packet dropping?
  3. Why can per-IP limits punish legitimate users behind shared NATs?
  4. What does a WAF add that pure packet-rate limits cannot?

Interview Q&A

Q1. What problem does network-layer rate limiting solve first? It protects connection state, bandwidth, and edge capacity before apps collapse. It keeps abusive load from exhausting infrastructure primitives. Common wrong answer to avoid: “It mainly exists to return HTTP 429 from APIs.” Q2. Explain SYN flood protection simply. Protect the TCP handshake so half-open connections do not consume all state. SYN cookies and edge filtering are common defenses. Common wrong answer to avoid: “Just increase server CPU and the SYN flood disappears.” Q3. What is the value of a token bucket? It allows short bursts while enforcing a steady long-term rate. That matches real traffic better than rigid hard caps. Common wrong answer to avoid: “Token buckets always drop traffic above the first burst.” Q4. Why combine Cloudflare or AWS Shield with a WAF? Shielding handles large upstream floods, while WAF rules inspect HTTP behavior. Together they reduce both volume risk and application-shaped abuse. Common wrong answer to avoid: “A WAF alone can absorb any DDoS size.”


Apply now (5 min)

Take one public endpoint and imagine its normal arrival pattern. Write safe limits for new connections, requests, and bandwidth. Then write one case where those limits might hurt real users. Finally decide where Cloudflare, AWS Shield, or your WAF would sit. Sketch from memory: draw the edge, limiter, WAF, and origin with one token-bucket example.


Bridge. Good. We can now protect networks better. But maturity also means admitting what remains messy, uncertain, or still evolving. → 13-honest-admission.md