Skip to content

03. TCP and UDP — reliable registered mail versus quick postcards

~15 min read. Transport choice decides reliability, latency, and system behavior.

Built on the ELI5 in 00-eli5.md. The post office — forwarding letters between endpoints — can offer tracking or just delivery attempts.


1) TCP starts with a three-way handshake

TCP builds state before sending application data. Both sides agree on sequence numbers and socket parameters. That setup costs time, but buys strong guarantees. Imagine registered mail with tracking and signed receipt. ┌──────────┐ ┌──────────┐ │ Client │ │ Server │ └────┬─────┘ └────┬─────┘ │ SYN seq=1000 │ ├────────────────────────────────────────▶│ │ │ │ SYN-ACK seq=7000 ack=1001 │ │◀────────────────────────────────────────┤ │ │ │ ACK ack=7001 │ ├────────────────────────────────────────▶│ ▼ ▼ established established Concrete numbers make the ACK logic easy. Client starts with sequence 1000. SYN consumes one sequence number. Server acknowledges 1001, meaning "I saw your SYN." Server starts at 7000. Client replies with ACK 7001. Now both sides can send data reliably. If RTT is 30 ms, handshake costs roughly one RTT. That is about 30 ms before application bytes move.

2) TCP keeps bytes ordered and complete

Each byte belongs to a sequence range. Receivers acknowledge the highest contiguous bytes received. If bytes 1001-2000 arrive, ACK may advance to 2001. If segment 2001-3000 is lost, progress pauses. Later data may arrive, but the app still sees ordered bytes. That is reliability plus in-order delivery. Worked example helps. Client sends three 1000-byte segments. Segment A: bytes 1001-2000. Segment B: bytes 2001-3000. Segment C: bytes 3001-4000. Network drops segment B. Server receives A and C. Server still ACKs 2001, not 4001. That duplicate ACK pattern hints that B is missing. Client retransmits B. Only then can the app read the full ordered stream. This is why one lost packet can stall many later bytes. Flow control is another TCP promise. Receiver advertises a window, like rwnd = 64 KB. That means, "Do not send more than 64 KB ahead." If the app reads slowly, advertised window shrinks. If the app catches up, window grows again. Flow control protects the receiver's memory and processing pace. Window scaling matters on fast long-distance links. A tiny window wastes bandwidth when RTT is large. Example: 64 KB window over 100 ms RTT. Maximum throughput is roughly 640 KB per second. That is only about 5.1 Mbps. Increase the window, and the same path carries far more data. So throughput is not only about raw link speed. It also depends on how much data may stay in flight. This is why tuning defaults matters for high-speed backbones. Developers rarely change it directly, but systems feel the effect.

3) Congestion control protects the network, not just endpoints

Flow control protects the receiver. Congestion control protects the shared path. TCP starts carefully using slow start. Example: congestion window cwnd = 10 MSS. Assume one MSS is 1460 bytes. Initial in-flight data is about 14.6 KB. If ACKs return cleanly, cwnd grows quickly. 10 MSS becomes 20, then 40, then 80. That exponential growth finds available capacity fast. Loss or ECN marks signal trouble. Then TCP cuts sending rate. A classic reaction halves the congestion window. 80 MSS may drop to 40 MSS. That reduction reduces queue buildup inside the post office chain. Retransmission timers matter when no duplicate ACKs appear. Suppose timeout is 200 ms. No ACK arrives by then. Sender retransmits the missing segment. That extra wait hurts latency badly on lossy links. So good networks keep loss low and RTT stable.

4) UDP skips state and guarantees

UDP is tiny and simple. Send a datagram and hope it arrives. No handshake. No ordering. No retransmission by the protocol itself. No built-in flow control. Think of a postcard dropped into the system. If one postcard vanishes, later postcards still arrive independently. That sounds dangerous, but sometimes it is perfect. Voice and video prefer freshness over perfect recovery. Online games prefer newest position updates over delayed old positions. DNS lookups prefer small, fast questions and retries at the application layer. Metrics and logs may choose UDP when occasional loss is acceptable. Worked comparison keeps decisions grounded. Live call audio frame arrives 100 ms late. With TCP, waiting for lost earlier bytes may worsen audio. With UDP, late or lost audio can simply be skipped. Bank transfer request is lost once. With UDP, your app must rebuild reliability carefully. With TCP, retransmission and ordering already exist. So protocol choice follows product needs, not fashion. UDP applications often rebuild selective reliability themselves. A video app may resend only key frames. A game may drop stale position packets completely. A metrics pipeline may batch and accept tiny loss. That flexibility is UDP's biggest strength. The protocol stays simple. The application chooses what matters. But that freedom pushes complexity upward. So teams must design failure behavior consciously.

5) When to choose which transport

Choose TCP for web pages, APIs, database connections, and file transfer. Choose UDP for realtime media, gaming, telemetry bursts, and DNS. Choose QUIC when you want UDP underneath with reliability above it. That is how HTTP/3 works later. Use concrete reasoning in interviews. Need strict ordering? Need built-in retransmission? Need lowest handshake delay? Need graceful behavior under loss? These questions separate TCP and UDP cleanly. Also remember ports and multiplexing. TCP and UDP both use ports to reach the right process. IP finds the host address. Port finds the correct application on that host. The payload still rides inside an envelope at lower layers. One concrete comparison makes interviews memorable. Payment API call: 4 KB request, exact-once semantics matter. Choose TCP. Retransmission and ordering are worth the extra setup. Live driver location update every 500 ms. Choose UDP or QUIC-like transport. Old coordinates lose value quickly. Miss one packet, and the next update still helps. File upload over weak hotel Wi-Fi. Choose TCP. Losing byte order would corrupt the object. Voice chat over 4G with 2 percent loss. Choose UDP-based media transport. A tiny crackle is better than half-second delay. Always connect transport choice to user pain. That is the interviewer-friendly framing.


Where this lives in the wild

  1. Zoom media engineer sends audio and video over UDP-based transport. Fresh frames matter more than perfect retransmission for conversation quality.
  2. PostgreSQL driver engineer relies on TCP for ordered query and result bytes. Databases hate missing or reordered application messages.
  3. Riot Games network engineer uses UDP for fast player state updates. Old position packets are less useful than current ones.
  4. Cloudflare transport engineer tunes congestion control to improve web performance. A smarter sender uses bandwidth without crushing shared queues.
  5. Swiggy platform engineer depends on TCP for payment and order APIs. Business operations need reliable delivery, not best-effort hope.

Pause and recall

  1. Why does TCP need three handshake messages before data transfer?
  2. What is the difference between flow control and congestion control?
  3. Why can one lost TCP segment stall later application bytes?
  4. Name two workloads where UDP is usually a better fit.

Interview Q&A

Q1. Explain TCP in one practical sentence. TCP is connection-oriented, reliable, ordered byte delivery with congestion handling. Then add handshake cost and retransmission behavior. Common wrong answer to avoid: "TCP guarantees fast delivery." Q2. Why is UDP useful despite lacking guarantees? It avoids connection setup and head-of-line waiting. Applications can trade perfection for freshness and lower overhead. Common wrong answer to avoid: "UDP is only for old legacy protocols." Q3. What does ACK 2001 mean in a TCP example? It means bytes up to 2000 arrived contiguously. The receiver now expects byte 2001 next. Common wrong answer to avoid: "It means segment number 2001 was received." Q4. How do flow control and congestion control differ? Flow control respects receiver capacity. Congestion control respects network capacity. Common wrong answer to avoid: "They are two names for the same TCP feature."


Apply now (5 min)

Take one workload from your current project. Write whether it values reliability, ordering, or freshness most. Pick TCP or UDP and justify with one concrete failure scenario. Now estimate the cost of one extra RTT for handshake or retransmission. Would users feel that cost? Sketch from memory Draw the TCP three-way handshake with sequence and ACK numbers. Then draw one lost segment and duplicate ACK pattern. Beside it, draw UDP as single datagrams with no return state.


Bridge. Reliable transport still does not hide content from observers. Next we seal the message, verify identity, and make the channel private. → 04-tls-and-https.md