06. API Gateway and Routing — The kitchen window that directs every plate¶

~14 min read. One disciplined front door saves many confused backends.

Built on the ELI5 in 00-eli5.md. The kitchen window — the single handoff between customers and cooks — now becomes the control point for routing, policy, and protocol translation.

First understand why the kitchen window exists¶

A serious backend should not expose every internal service directly. Clients would juggle many URLs, ports, credentials, and timeout rules. The kitchen window hides that sprawl behind one stable entrance. It accepts the order slip once, then forwards it correctly. That centralization brings consistency, security, and simpler client code. Think of mobile apps, partner apps, dashboards, and internal tools. All of them want one dependable front door. They do not want five slightly different authentication stories. They do not want protocol details leaking everywhere. They want a clean contract. The gateway gives that contract. A simple picture looks like this: ┌──────────┐ ┌────────────────┐ ┌──────────────┐ │ Clients │ → │ API Gateway │ → │ User service │ └──────────┘ └──────┬─────────┘ └──────────────┘ ├────────────→ ┌──────────────┐ │ │ Order service│ ├────────────→ └──────────────┘ │ ┌──────────────┐ └────────────→ │ Search svc │ └──────────────┘ Without the gateway, clients know each backend by name. With the gateway, clients know one public address. See the design win. Internal services can move without breaking external callers. Teams can split or merge services later. The front door stays stable. Now remember one caution. A gateway is not magic. It removes client complexity only by taking complexity onto itself. So the real design question is not, “Should we use one?” The real question is, “What logic deserves to live there?”

Routing rules decide which kitchen station gets the order¶

Routing is the first obvious job. The gateway inspects method, path, host, and sometimes headers. Then it chooses the right backend service. That sounds small, but it shapes the whole API surface. Path-based routing is the most common pattern. /users/123 can go to the user service. /orders/456 can go to the order service. /search?q=tea can go to the search service. Host-based routing helps when products need separate domains. admin.example.com may go somewhere different from api.example.com. Header-based routing helps during experiments and partner-specific behavior. A picture helps: ┌──────────────────────┐ │ Gateway routing table│ ├──────────────────────┤ │ /users/ → user │ │ /orders/ → order │ │ /search/* → search │ │ Host=admin → admin │ └──────────────────────┘ Good routing rules feel boring. Boring is good here. Rules should be predictable, documented, and easy to debug. Now a worked example. Suppose a food-delivery app receives 12,000 requests each second. Forty percent hit restaurants. Thirty percent hit cart and checkout. Thirty percent hit search and recommendations. The gateway can split those flows before backends get overloaded. Restaurants get 4,800 requests each second. Checkout gets 3,600 requests each second. Search gets 3,600 requests each second. That separation lets each team scale independently. It also lets failures stay more local. If search becomes slow, checkout need not drown immediately. Routing can also support canary releases. For example, send five percent of /checkout traffic to v2. Keep ninety-five percent on the stable path. That is much safer than flipping everything together. So, routing is not only direction. Routing is also controlled risk.

Protocol translation lets clients speak simply¶

Modern backends rarely use one protocol everywhere. Some teams expose REST publicly, then use gRPC internally. Others accept WebSockets at the edge and publish events inside. The gateway can translate these boundaries. Clients keep a simple interface. Services keep the protocol that suits them best. That is why protocol translation matters. Imagine a mobile app sending HTTPS JSON requests. Inside, the order service speaks gRPC for speed. The kitchen window can accept JSON, validate fields, and call gRPC. The app team avoids protobuf details. The backend team keeps efficient service-to-service communication. Here is the flow: ┌────────────┐ HTTPS/JSON ┌───────────────┐ gRPC ┌──────────────┐ │ Mobile app │ ─────────────→ │ API Gateway │ ───────→ │ Order service│ └────────────┘ └───────────────┘ └──────────────┘ That translation does add latency. So be disciplined. Do not transform data ten times without reason. Use translation where it reduces coupling clearly. Now another useful pattern is request aggregation. A client screen may need profile, orders, and loyalty points. Calling three services directly adds more round trips. The gateway can fetch them together and return one combined result. Example response assembly: ┌─────────┐ │ Client │ └────┬────┘ │ one request ▼ ┌───────────────┐ │ API Gateway │ └────┬────┬─────┘ │ │ │ └────→ loyalty service ├─────────→ order service └─────────→ profile service ▼ one combined response Worked example. Suppose each backend call takes 120 milliseconds on average. Three separate client calls can feel like 360 milliseconds plus overhead. Parallel aggregation can finish near the slowest branch instead. Maybe total user-visible time becomes 150 milliseconds. That difference matters on shaky mobile networks. Still, aggregation has a trap. The gateway should compose thinly, not become business logic soup. If aggregation rules grow complex and domain-heavy, move them elsewhere.

Auth offloading and policy checks save repeated effort¶

Authentication is another common gateway duty. Every service should not reimplement token parsing from scratch. The gateway can verify JWTs, API keys, OAuth scopes, and mTLS. Then services receive trusted identity context. This is similar to smart wait staff checking dining permissions early. The chef should not debate table reservations for every dish. The service should focus on domain logic. A common sequence looks like this: ┌──────────┐ token ┌───────────────┐ claims ┌──────────────┐ │ Client │ ─────────→ │ API Gateway │ ─────────→ │ Backend svc │ └──────────┘ └──────┬────────┘ └──────────────┘ │ └────→ auth provider / key store The gateway can also enforce rate limits. It can reject oversized payloads. It can normalize headers. It can add correlation IDs for tracing. It can block clearly malicious traffic quickly. Those are valuable shared controls. Now the design boundary matters again. Auth verification at the edge is useful. Authorization decisions are often trickier. Simple checks like “token has checkout scope” fit well. Deep rules like “user can refund this merchant’s last three orders” often belong inside the domain service. Why? Because domain services know the freshest business facts. Do not centralize nuanced authorization blindly. Worked example. Suppose five services each spend 8 milliseconds verifying a token. A request touching all five burns 40 milliseconds on repeated checks. If the gateway verifies once and forwards claims safely, that overhead drops sharply. You save latency and reduce duplicated code. But preserve defense in depth for sensitive actions. Never trust forwarded claims from arbitrary callers. Trust them only from the gateway boundary.

Do not turn the gateway into the longest queue¶

A gateway helps only while it stays fast and simple. If every feature gets shoved there, it becomes the bottleneck. This is the classic traffic-jam failure mode. One overloaded front door can slow every product team together. So keep a shortlist of responsibilities. Good gateway jobs are routing, light transformation, shared auth checks, and edge protections. Bad gateway jobs are deep workflow orchestration, heavy joins, and domain-specific mutation rules. Now choose tooling by need. Kong is strong when teams want plugins and API management features. AWS API Gateway fits well for managed cloud entry points, throttling, and Lambda integration. Envoy shines when teams need high-performance proxying and service-mesh friendliness. Each tool can route, observe, and enforce policy. The choice depends on operating model, not branding alone. A final worked example makes this concrete. Suppose peak traffic is 50,000 requests each second. One gateway node safely handles 12,000 requests each second. Minimum nodes for raw capacity equals 50,000 divided by 12,000. That gives 4.17, so you need at least five nodes. Now apply failure thinking. If one node dies, four nodes remain. 50,000 divided by 4 equals 12,500 per node. That already exceeds safe capacity. So five is not enough. Use six nodes. Then a one-node failure leaves 10,000 per node. That gives breathing room. See the lesson. Your kitchen window needs capacity math, not only neat diagrams.

Where this lives in the wild¶

Kong platform engineer at Dream11 — exposes sports, payments, and profile APIs through plugin-based routing and shared auth policies.
AWS API Gateway-focused backend engineer at Razorpay — fronts public payment APIs, throttles partners, and integrates cleanly with managed cloud services.
Envoy service-mesh engineer at Lyft — uses edge and sidecar proxies for routing, retries, observability, and progressive delivery.
Platform engineer at Swiggy — aggregates restaurant, cart, and delivery data behind one mobile-friendly API boundary.
Cloud architect at Netflix — designs gateway layers that route traffic safely across many independently deployed backend domains.

Pause and recall¶

Why does a gateway reduce client complexity but increase edge complexity?
When is protocol translation genuinely useful, and when is it unnecessary ceremony?
Which authorization checks belong comfortably at the gateway, and which do not?
Why can request aggregation improve latency for mobile clients noticeably?

Interview Q&A¶

Q: Why place request routing at the API gateway instead of inside clients? A: Because clients should depend on one stable public contract, not many moving internal endpoints. Central routing lets backend teams evolve service topology without breaking callers repeatedly. Common wrong answer to avoid: “Because microservices require gateways” — some systems work without one, but complexity and client burden usually rise. Q: When should a gateway perform protocol translation between REST and gRPC? A: When public clients need simple HTTP interfaces but internal services benefit from efficient gRPC contracts. The translation is worth it only when it clearly reduces coupling. Common wrong answer to avoid: “Always translate for performance” — translation usually adds overhead, so the value is abstraction, not free speed. Q: Why is auth offloading useful, yet not a complete replacement for service checks? A: Because shared token verification and edge filtering remove repeated work, but deep authorization still depends on domain facts inside services. Sensitive operations deserve layered trust boundaries. Common wrong answer to avoid: “Once the gateway authenticates, services can trust everything” — they should trust identity context carefully, not skip all security reasoning. Q: How does request aggregation help, and what is the biggest danger? A: It can reduce client round trips by combining multiple backend fetches into one response. The danger is turning the gateway into a heavy business-logic layer that becomes hard to scale. Common wrong answer to avoid: “Aggregation always belongs at the gateway” — thin composition is fine, but domain-heavy orchestration often belongs elsewhere.

Apply now (5 min)¶

Imagine one mobile screen needs user profile, cart summary, and coupons. Sketch which calls can be aggregated at the gateway. Then mark one check that belongs at the edge. Next, mark one rule that must stay inside the checkout service. Now assume peak traffic is 36,000 requests each second. One gateway node safely handles 9,000 requests each second. How many nodes do you need with one-node failure tolerance? Show the arithmetic clearly. Then write one sentence choosing between Kong, AWS API Gateway, and Envoy. Base your choice on operating model, not popularity. Sketch from memory: draw clients, one kitchen window, three backend services, and labels for routing, translation, and auth. Do not peek back.

Bridge. The request reached the right service. Now we must stop retries from creating duplicate side effects. → 07-idempotency-and-retries.md