10. VPC and Private Networking — your cloud's private postal colony¶

~18 min read. Build the private lanes before chasing distributed brilliance.

Built on the ELI5 in 00-eli5.md. The post office — private post offices only accept internal mail — explains why VPC boundaries feel safe.

1) Start with the mental map¶

A VPC is your private network inside one cloud account. Think of it as a gated colony with numbered roads. Every VM, database, and load balancer gets a private address. That address works only inside this colony unless you expose it. The cloud still owns the outer city roads and traffic lights. You own the street plan inside your colony. That is why CIDR blocks matter from day one. A /16 block gives 65,536 total addresses before reservations. A /24 block gives 256 total addresses before reservations. Cloud providers reserve a few addresses inside every subnet. So your usable count is always slightly smaller. If you choose tiny ranges early, future growth becomes painful. If you choose overlapping ranges, peering later becomes awkward. See the basic picture first.

┌──────────────────────────────────────────────┐
│ VPC: 10.0.0.0/16                             │
│                                              │
│  ┌────────────────┐   ┌────────────────┐     │
│  │ Public subnet  │   │ Private subnet │     │
│  │ 10.0.1.0/24    │   │ 10.0.2.0/24    │     │
│  │ ALB, NAT       │   │ App, DB        │     │
│  └────────────────┘   └────────────────┘     │
└──────────────────────────────────────────────┘

Subnets are slices of the VPC range. Each subnet lives inside one availability zone. That detail matters for resilience and cost control. You cannot stretch one subnet across many zones. You create multiple subnets to spread failure domains. A common first layout uses two public and two private subnets. Public means the route table can reach an internet gateway. Private means the route table cannot accept inbound internet paths. Please notice the wording very carefully here. A public subnet is not automatically unsafe. A private subnet is not automatically reachable everywhere. Routing and filtering together decide the real story. The post office idea helps here. Internal post office workers sort internal letters first. Only selected counters talk to the outside world.

2) Public and private are route-table decisions¶

A subnet becomes public when its default route points outward. Usually that outward hop is an internet gateway. Instances inside that subnet still need public IPs sometimes. Without a public IP, return traffic cannot find them directly. That is why public subnet plus public IP often travel together. Private subnets remove that direct outward route for workloads. Your app servers sit there. Your databases definitely sit there. Your caches and internal queues usually sit there too. Now ask a sharp question. How will private machines download patches or call payment APIs? This is where a NAT gateway enters the picture. The NAT gateway sits in a public subnet. Private subnet route tables send outbound internet traffic there. The NAT changes source addresses for outbound requests. Replies come back to the NAT and then inward. Outside clients still cannot start new inbound sessions. That single pattern protects many boring but critical systems.

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│ App server   │ ───▶ │ NAT gateway  │ ───▶ │ Internet API │
│ 10.0.2.15    │      │ 10.0.1.8     │      │ 34.120.8.20  │
└──────────────┘      └──────────────┘      └──────────────┘
       ▲                       │
       └──────── Replies ──────┘

Worked example time. Assume two app subnets each need package updates daily. Each server downloads 1.5 GB per day. You run 20 servers. Total outbound data becomes 30 GB per day. Over a 30-day month, that is 900 GB. If your NAT charges per GB and per hour, plan both. People often estimate compute cost and forget the network bill. Now add availability zone awareness. One NAT per zone reduces cross-zone hairpinning. One shared NAT is cheaper but creates a bigger blast radius. So architecture is always cost versus failure isolation. Route tables deserve respect because they encode these choices. A public subnet route might look like 0.0.0.0/0 -> igw. A private subnet route might look like 0.0.0.0/0 -> nat. Internal routes like 10.0.0.0/16 -> local exist automatically. If the route is wrong, the packet never gets its first step.

3) Security groups and NACLs are different guards¶

Security groups are stateful guards attached to resources. If inbound port 443 is allowed, return traffic is remembered. You usually attach them to instances, load balancers, or databases. NACLs are stateless guards attached to subnets. They inspect traffic entering and leaving the subnet boundary. Because they are stateless, you must allow both directions explicitly. That makes NACLs sharper but easier to misconfigure. Remember this interview sentence. Security groups ask, “Who can talk to this machine?” NACLs ask, “What traffic can cross this street?” Use security groups for day-to-day access policy. Use NACLs for coarse subnet guardrails or emergency blocking. See the layering.

Internet
   │
   ▼
┌──────────────┐
│ Route table  │
└──────┬───────┘
       ▼
┌──────────────┐
│ Subnet NACL  │
└──────┬───────┘
       ▼
┌──────────────┐
│ Instance SG  │
└──────┬───────┘
       ▼
┌──────────────┐
│ Application  │
└──────────────┘

Now extend beyond one VPC. A site-to-site VPN connects your office or data center securely. It creates a private tunnel over public internet paths. The payload stays inside a sealed envelope while traveling. VPC peering connects two VPCs directly using private addresses. It is simple and fast, but non-transitive. If A peers with B, and B peers with C, A still misses C. That non-transitive rule surprises many engineers. PrivateLink solves a different problem. It exposes one service privately without sharing whole network ranges. Consumers reach the service through private endpoints. They do not need full peering relationships. That reduces blast radius and address overlap pain. For platform teams, PrivateLink is very elegant.

4) Put the pieces together with real designs¶

Suppose you are designing an ecommerce checkout system. You have a public load balancer, private app servers, and private databases. Choose VPC range 10.20.0.0/16 for the whole environment. Create public subnets 10.20.1.0/24 and 10.20.2.0/24. Create private app subnets 10.20.11.0/24 and 10.20.12.0/24. Create private data subnets 10.20.21.0/24 and 10.20.22.0/24. The load balancer security group allows inbound 443 from internet. The app security group allows 8080 only from the load balancer. The database security group allows 5432 only from app servers. That chain blocks accidental sideways exposure. Now imagine the finance system lives in another VPC. Its CIDR is 10.30.0.0/16. You need app servers to call one invoice service. Do not peer entire VPCs immediately. Use PrivateLink if only one service must be consumed cleanly. Use peering if several bidirectional private paths are truly required. Worked example with overlapping ranges now. Your old VPC uses 10.0.0.0/16. A vendor-managed VPC also uses 10.0.0.0/16. Peering fails because routes become ambiguous. This is why address planning is architecture, not clerical work. Another worked example with NACLs. You allow inbound 443 to a subnet. But you forget ephemeral return ports like 1024-65535 outbound. Connections appear to open and then mysteriously die. That is a classic stateless-filter mistake. Keep one last checklist ready. Can the route reach the destination? Does the security group allow the session? Does the NACL allow both directions? Does the name resolve to the correct private or public target? Can the source and destination CIDRs coexist without overlap? If these five answers are clean, most VPC designs behave.

Where this lives in the wild¶

AWS VPC engineer at Razorpay: separates payment APIs, databases, and bastion access using layered private subnets.
Platform SRE at Flipkart: places NAT gateways per zone to avoid cross-zone dependency during sales traffic.
Security architect at Zerodha: uses PrivateLink for internal market-data services without opening wide peering paths.
Cloud network engineer at Swiggy: connects on-prem kitchen systems through VPN tunnels into private application networks.
Infrastructure lead at Amazon: designs security-group references between service tiers for tightly scoped east-west traffic.

Pause and recall¶

Why does a public subnet still need careful security-group rules?
What exact problem does a NAT gateway solve for private subnets?
Why can overlapping CIDR blocks ruin later peering plans?
When is PrivateLink safer than full VPC peering?

Interview Q&A¶

Q1. What makes a subnet public in AWS? A route to an internet gateway makes it public. A public IP on instances is usually also needed for direct reachability. Common wrong answer to avoid: “Any subnet with a load balancer is public.” Q2. Security groups versus NACLs — explain simply. Security groups are stateful rules at the resource boundary. NACLs are stateless rules at the subnet boundary. Common wrong answer to avoid: “Both are basically firewalls, so treat them identically.” Q3. When would you choose PrivateLink instead of peering? Choose PrivateLink when one private service must be consumed cleanly. Choose peering when broader private network reach is intentionally needed. Common wrong answer to avoid: “PrivateLink is just cheaper peering for every case.” Q4. Why do private subnets still sometimes need internet egress? Servers need updates, third-party APIs, and outbound package downloads. NAT gives that egress without opening direct inbound internet access. Common wrong answer to avoid: “Private means the subnet should never touch the internet at all.”

Apply now (5 min)¶

Take one three-tier application you know from work or practice. Assign one VPC CIDR, six subnets, and three security groups. Write one route table for public subnets and one for private. Then decide whether NAT, VPN, peering, or PrivateLink is required. Sketch from memory: draw the VPC box, subnets, IGW, NAT, SG, and one database path.

Bridge. Very good. Now you know the private roads. Next, learn how to debug traffic when those roads behave strangely. → 11-network-debugging.md