03. IAM, VPC, and security — roles, networks, and least privilege¶

⏱️ Estimated time: 20 min | Level: intermediate

ELI5 callback: In the dragon farm, the barn runs the work, the feeding trough holds the data, the fence limits access, the breeding ground scales the herd, and the ledger stops waste. Today we build access control and network boundaries that do not leak.

1) See the shape clearly¶

IAM roles, VPC networks, and security groups all matter here. They do not optimise the same pressure. See. Start with workload shape, not vendor branding. Check startup time, runtime length, and host control. Check who patches the base layer. Check whether scale is steady or bursty. Check whether warm state must survive. Simple, no? IAM decides who may call which cloud action. VPC design decides which network paths even exist. Security groups and policies narrow traffic to allowed flows. Least privilege works only when identity and network agree. So what to do? Write the fit matrix before provisioning anything. - Prioritise the slowest or costliest path. - Measure idle time honestly. - Record operational ownership. - Record rollback method. - Record debugging path. - Record compliance limits. Good teams choose boring defaults first. Fancy choices can wait.

2) Read the decision signals¶

Use roles or managed identities instead of long-lived keys whenever possible. Place sensitive services in private subnets behind controlled entry points. Keep security groups narrow by port, source, and destination. Separate human admin access from machine-to-machine access. Encrypt data at rest and in transit before calling the system safe. Review defaults because many clouds start more open than teams assume. Now use thresholds, not feelings. If latency is sacred, keep readiness. If cost is sacred, chase utilisation carefully. If control is sacred, reduce abstraction. If delivery speed is sacred, buy managed pieces. Quick decision prompts: - Which principals need read, write, or admin rights? - Which workloads need public internet exposure? - Which traffic can stay private inside the VPC? - How are secrets delivered without hardcoding? - Which logs prove access decisions later? - What is the break-glass path? See. One clear 'no' can eliminate a whole option. Trade-offs are normal. Document the fallback path. Now watch.

3) Map the working path¶

A secure cloud path combines identity and network checks. One without the other leaves holes. Requests should pass through explicit trust boundaries. Now watch the clean mental model. ┌────────────┐ ┌────────────┐ ┌────────────┐ │ User/App │──→│ IAM Role │──→│ PrivateNet │ └────────────┘ └─────┬──────┘ └─────┬──────┘ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ DataSvc │ │ Audit │ └────────────┘ └────────────┘ The principal receives a role or identity first. Then network rules decide whether the path is reachable at all. Private services should avoid public IPs unless truly needed. Audit logs should capture both API calls and network changes. Encryption keys deserve separate ownership and rotation policies. Good security reviews ask what should be impossible, not only what should work. At every arrow, ask who retries. At every box, ask who pays. At every store, ask what expires. Now watch. One metric should sit beside each box. That is how operations stays sane.

4) Notice the common traps¶

Giving wildcard permissions because the policy syntax feels annoying. Putting databases in public subnets for convenience. Using one shared role for every workload in a platform. Forgetting egress controls and focusing only on ingress. Treating encryption as a checkbox without key ownership. Skipping log review until an incident appears. See. Most outages start as silent assumptions. Review these traps before launch: - Overbroad IAM can turn one bug into total compromise. - Public endpoints can expand the attack surface dramatically. - Flat networks make lateral movement easier. - Stale security groups can expose forgotten services. - Unrotated secrets can survive far too long. - Missing audit trails can block incident response. Simple, no? Write failure drills for the top three risks. Decide what degrades first. Decide what must never degrade. Review quotas before launch day. Prefer explicit limits over wishful thinking. Now watch.

5) Lock the operating routine¶

Write roles by workload, not by team mood. Keep databases, caches, and internal APIs on private paths. Restrict ports, CIDRs, and egress rules deliberately. Use short-lived credentials and secret rotation. Turn on audit logging for IAM, keys, and network changes. Review least privilege after every major feature launch. Lock the language across the team. Use the same terms in code, dashboards, and reviews. Review this quick operating list: - Prefer roles over static keys. - Prefer private endpoints over public ones. - Test denied paths, not only allowed paths. - Rotate secrets with automation. - Review security groups monthly. - Separate duty for key management. Good platform design keeps the barn, feeding trough, fence, breeding ground, and ledger aligned. So what to do? Create a one-page runbook. Create a one-page cost note. Create a one-page rollback note. Teach the team the same words. That alignment saves real money. See. Consistency beats cleverness. Benchmark first; opinions come second. Name the owner of every limit. Prefer reversible choices whenever the future is foggy. Document what changes during incidents. Keep one small default path for newcomers. Automate the boring thing as soon as it stabilises. Vendor docs help, but workload data matters more. Good naming prevents bad tickets. Observe p95, not only averages. Small runbooks beat heroic memory. Teach cost with the same seriousness as latency. Now watch how much confusion disappears.

Where this lives in the wild¶

AWS IAM roles attached to EC2, ECS, and SageMaker workloads. Classic pattern for removing static credentials from application code.
Google Cloud service accounts with VPC Service Controls. Shows identity and perimeter controls working together.
Azure managed identities and Key Vault backed apps. Useful for enterprise systems that want central secrets and strong audit trails.
Kubernetes NetworkPolicy plus cloud security groups. Common way to combine cluster-level and cloud-level boundaries.
PrivateLink or Private Service Connect style private endpoints. Very useful when managed services should not cross the open internet.

Pause and recall¶

Why is IAM alone not enough? Say it without looking up vendor names.
What does least privilege mean in a practical design? Give one concrete example.
Why are private subnets preferred for data services? State the trade-off in one line.
What evidence helps after a security incident? Mention one failure mode too.

Interview Q&A¶

Q. How would you secure an AI inference service? A. Use workload identities, private networking, narrow security groups, and rotated secrets with audit logs. Common wrong answer to avoid: Put it behind HTTPS and call it done. Better direction: Cover identity, network, encryption, and audit together.

Q. What is least privilege? A. Grant only the minimum actions, resources, and time window needed for a job. Common wrong answer to avoid: Give read-only to everyone and admin to one shared role. Better direction: Mention scoping by action, resource, and duration.

Q. Why keep databases in private subnets? A. Because most clients do not need direct public reachability, and private paths shrink exposure. Common wrong answer to avoid: Public subnets are fine if the password is strong. Better direction: Explain attack surface and control points.

Q. What should security reviews examine first? A. Review identities, trust boundaries, exposed paths, and logging coverage. Common wrong answer to avoid: Check the firewall screen and move on. Better direction: Look for assumptions that made broad access seem acceptable.

Apply now (5 min)¶

Pick one service in your stack.
List which humans and machines touch it.
Write the exact cloud actions each one needs.
Mark whether the service needs public exposure.
List one path that should be impossible.
List one secret that must rotate.
List one audit log you would inspect first.
Tighten one permission in your draft design.

Bridge. Fences built. But what about managed databases and caches? → 04