13. Honest Admission — where networking still humbles confident engineers¶
~16 min read. Strong engineers say clearly what remains uncertain and why.
Built on the ELI5 in 00-eli5.md. The envelope — some envelopes still get lost and we do not always know why — keeps us honest.
1) BGP security is still a trust exercise with sharp edges¶
BGP tells networks which paths announce which prefixes. That sounds administrative, but it shapes huge traffic flows. The problem is trust. Networks largely trust route announcements from neighboring networks. When that trust is wrong, traffic can wander strangely. A bad route leak can blackhole traffic. A hijack can attract traffic somewhere unintended. RPKI improves validation, but deployment is not magically universal. Policies differ. Operational maturity differs. Human mistakes still happen. So honest engineers avoid saying, “The internet routing layer is solved.” It is not solved cleanly enough. See the oversimplified picture.
When users complain, packets may be healthy inside your system. The path outside may still be unstable or misannounced. That uncertainty matters during incident communication. Do not promise certainty you do not possess. The address can be correct while the global road map is wrong.
2) IPv6 adoption is real, but stubbornly uneven¶
We know IPv6 solves address exhaustion better than endless patches. We know dual-stack designs reduce future pain. Still, adoption remains uneven across providers, enterprises, and tools. Some middleboxes behave differently under IPv6. Some teams barely test it. Some observability dashboards still privilege IPv4 assumptions. Some vendor integrations quietly lag behind product marketing. That creates awkward half-modern environments. Worked example. Your CDN serves both A and AAAA records. Thirty percent of users reach you over IPv6. A new firewall policy forgets IPv6 egress rules entirely. Those users see failures while IPv4 users stay happy. If you only test IPv4 dashboards, you miss the incident.
The lesson is simple. Support claims are not operational proof. Dual stack doubles some surface area. It also doubles some blind spots unless you measure deliberately. So we should speak carefully here. IPv6 is necessary. IPv6 rollout is still messy in practice.
3) End-to-end encryption versus inspection is a real tension¶
We love encryption because privacy and integrity matter deeply. The sealed envelope protects users from eavesdropping and tampering. But operations teams also need visibility for abuse, malware, and outages. These goals do not always align perfectly. If traffic is fully end-to-end encrypted, middleboxes see less. That is good for privacy. That is also harder for inspection and policy enforcement. Enterprises sometimes terminate TLS at proxies for inspection. That helps visibility, but reduces pure end-to-end guarantees. There is no cheap slogan that resolves this neatly. Worked example. A company wants to inspect file uploads for malware. Traffic is TLS from employee browser to SaaS provider. If the company inserts a proxy, it can inspect uploads. But now employees must trust the corporate proxy certificate. Some apps break. Some privacy expectations change. Some compliance teams become happier. Some security teams become more worried. That is a real tradeoff, not a classroom trick.
So be mature in interviews and real work. Say what is gained. Say what is lost. Then explain who accepts that tradeoff and why.
4) CDN cache poisoning remains subtle and unpleasant¶
Caching feels simple until keys and headers disagree subtly.
A CDN might cache based on path, host, and selected headers.
If that cache key is incomplete, trouble begins.
Attackers may smuggle one response variant into shared cache.
Then innocent users receive the poisoned response.
Sometimes it is stale auth state.
Sometimes it is bad content.
Sometimes it is cross-tenant confusion.
The scary part is invisibility.
Origin logs may look fine.
Only some edges may be affected.
Only certain headers may trigger the poison.
Worked example with concrete numbers.
Suppose cache key includes path but ignores X-Tenant-ID.
Tenant A requests /dashboard with X-Tenant-ID: 41.
Origin returns tenant-specific HTML in 180 ms.
CDN caches it for 300 seconds.
Tenant B requests the same path with X-Tenant-ID: 52.
If header is ignored, Tenant B may receive Tenant A content.
That is catastrophic.
Request key used: Host + Path
Missing key piece: X-Tenant-ID
Result: shared cache serves wrong variant
This is why cache design deserves paranoia. When someone says, “CDN just makes things faster,” correct gently. CDN also changes correctness risk.
5) QUIC and HTTP/3 are promising, but still evolving operationally¶
QUIC reduces handshake cost and handles loss better in many cases.
That is excellent.
It also moves transport logic into user space over UDP.
That changes tooling and operational habits.
Old debugging instincts do not always transfer cleanly.
Middleboxes sometimes interfere oddly with UDP-heavy traffic.
Metrics may look different.
Connection migration adds new behavior.
Some libraries mature quickly.
Some still surprise teams under real load.
Worked example.
Your median page load drops from 900 ms to 760 ms on HTTP/3.
Wonderful result.
Then one enterprise customer reports periodic stalls behind a proxy.
IPv4 path looks fine.
HTTP/2 fallback also looks fine.
QUIC path alone misbehaves intermittently.
Now you need protocol-specific observability and patience.
This does not mean QUIC is bad. It means the ecosystem is still settling. Mature engineers can celebrate progress and admit friction simultaneously. That balance matters. One final admission ties everything together. Networking combines standards, vendors, kernels, policies, and human behavior. So complete certainty is rare. Careful reasoning is still possible. We just owe people honest confidence levels.
Where this lives in the wild¶
- Internet routing engineer at Cloudflare: monitors route leaks and RPKI coverage during global reachability incidents.
- Network architect at Meta: plans dual-stack rollouts while handling uneven IPv6 behavior across partners and devices.
- Enterprise security lead at JPMorgan: balances TLS inspection needs against privacy and application breakage risk.
- CDN engineer at Fastly: designs cache keys carefully to prevent tenant mixing and edge poisoning issues.
- Protocol engineer at Google Chrome: improves QUIC behavior while still tracking corner-case proxy and path failures.
Pause and recall¶
- Why is BGP security still partly a trust and policy problem?
- What makes IPv6 adoption harder than simply enabling AAAA records?
- Why can TLS inspection improve safety and weaken another safety property simultaneously?
- How can an incomplete CDN cache key become a security problem?
Interview Q&A¶
Q1. What is one networking area you would describe as still imperfect? BGP security remains imperfect because route trust and adoption are uneven. RPKI helps, but operational reality is still incomplete. Common wrong answer to avoid: “Internet routing is solved now because major providers are careful.” Q2. Why does IPv6 still create delivery risk during migrations? Because dual-stack paths, tooling, policies, and vendors behave unevenly. Testing only IPv4 can hide failures affecting real IPv6 users. Common wrong answer to avoid: “IPv6 is just bigger addresses, so rollout is trivial.” Q3. How would you explain encryption versus inspection tension? Encryption hides content from attackers and also from middlebox inspectors. Visibility gains from interception come with trust and privacy tradeoffs. Common wrong answer to avoid: “Just decrypt everywhere internally; there is no serious downside.” Q4. Why is cache poisoning tricky to catch? Because wrong variants may appear only at some edges or key combinations. Origin behavior can look correct while cached behavior is corrupted. Common wrong answer to avoid: “If origin responses are fine, the CDN layer is automatically safe.”
Apply now (5 min)¶
Choose one networking topic above that still feels uncomfortable. Write three sentences beginning with, “We know”, “We suspect”, and “We still need”. Then list one metric or experiment that would reduce uncertainty. Finally note which users would feel the risk first. Sketch from memory: draw one uncertain path and label the exact place confidence drops.
Bridge. Excellent. Networking humility prepares you for the next layer. Backend API design inherits these constraints, failures, and tradeoffs immediately. → ../04_backend_api_design/00-eli5.md