09. Deployment Strategies¶
⏱️ Estimated time: 21 min | Level: intermediate
ELI5 callback: In the hospital analogy, the monitor alarm should catch early regression, the thermometer should show trend during rollout, and the playbook should define widening or stopping steps.
1) Choose strategy by risk shape, not fashion¶
Deployment strategy is a risk management choice. A thermometer must compare old and new versions side by side.
Different systems need different blast-radius controls.
A stateless API may tolerate rolling deployment easily.
A payment engine may demand slower, more isolated rollout.
See. Strategy follows failure cost and reversibility.
Traffic volume, statefulness, and dependency sensitivity all matter.
Also check whether hidden side effects appear only under partial load.
That question often separates good rollout design from naive rollout design.
┌──────────────┬───────────────┬─────────────────────┐ │ Strategy │ Main strength │ Main caution │ ├──────────────┼───────────────┼─────────────────────┤ │ Rolling │ simple │ mixed old and new │ │ Blue-green │ fast switch │ double capacity │ │ Canary │ low blast │ slower coordination │ │ Shadow │ safe compare │ side-effect control │ └──────────────┴───────────────┴─────────────────────┘ Use an X-ray when canary traffic slows only on certain paths. - Evaluate rollback speed before picking a rollout style.
-
Match the method to data compatibility constraints.
-
Consider operator complexity, not only infrastructure cost.
-
Define what success and failure mean before traffic moves.
2) Rolling and blue-green solve different problems¶
Rolling updates replace instances gradually within one environment.
They are efficient and common.
But old and new versions coexist for some time.
That can be fine for compatible stateless code.
It can be ugly for incompatible behavior.
Another X-ray can reveal proxy or database shifts hidden by averages. Blue-green keeps two full environments.
Traffic switches from blue to green when confidence is ready.
Simple, no? More isolation, more cost.
-
Rolling works well when versions can safely overlap.
-
Blue-green is strong when fast global rollback is required.
-
Both still need health checks and traffic verification.
-
Database compatibility can still break either strategy.
3) Canary and progressive delivery reduce blast radius¶
Canary release sends a small slice of real traffic to the new version.
That makes user impact measurable before full exposure.
The medical chart should record version, cohort, and flag state together. Progressive delivery expands that idea with staged automation.
Move from one percent to five, then twenty, then full.
So what to do between stages?
Compare key metrics, logs, and traces by version.
Stop widening if error rate, latency, or business conversion degrades.
Canary is slow on purpose. That is the feature.
-
Keep comparison windows long enough to avoid random noise.
-
Segment by route or tenant if one path is riskier than others.
-
Automate promotion gates only after the metrics are trusted.
-
Remember that low-traffic canaries may miss rare heavy-path failures. Another medical chart query should separate rollout noise from baseline noise.
4) Shadow traffic and dark launches reveal hidden behavior¶
Shadow traffic copies requests to the new system without serving responses.
That helps compare behavior under real production load.
It is valuable when correctness is subtle.
Recommendation, search, and ranking systems benefit a lot here.
Now watch. Shadowing is not risk-free.
Side effects must be disabled or redirected safely.
Data writes, emails, and external calls need strict isolation.
Otherwise your safe test becomes a duplicate-action incident. A monitor alarm should watch canary regression and rollback windows.
-
Use read-only or sandboxed dependencies for shadow paths.
-
Compare latency and output shape, not just success rate.
-
Keep request identifiers so shadow mismatches are debuggable.
-
Drop or mask sensitive data when policy requires it.
5) Release engineering needs feedback loops¶
Deployment is complete only after verification.
Health checks, versioned metrics, and rollback rules close the loop.
Good pipelines know when to stop automatically.
Great pipelines also make manual judgment easy.
See. Automation and human review should support each other.
Put version labels on metrics, logs, and traces.
Keep change annotations visible on dashboards.
Review failed rollouts for detection speed and rollback clarity.
- Define automated promotion gates for core user journeys.
- Keep manual stop authority during risky launches.
- Record deploy start, widen steps, and rollback times.
- Rehearse high-risk releases during staffed windows. The playbook should define promotion, pause, and rollback thresholds.
Where this lives in the wild¶
- Web product teams often use rolling deploys for routine stateless services.
- High-risk gateways and auth layers prefer blue-green or guarded canaries.
- ML serving platforms use shadow traffic to compare model behavior safely.
- Platform engineering teams build progressive delivery pipelines with automatic metric checks.
- Fintech and healthcare systems rely on stronger release gating because rollback cost is high.
Pause and recall¶
- Why should deployment strategy follow risk shape instead of habit?
- When does blue-green beat rolling deployment?
- Why is canary rollout intentionally slow?
- What hidden danger makes shadow traffic less safe than it first sounds?
Interview Q&A¶
Q: Why not use rolling deploys for every service? A: Because mixed-version overlap, state changes, and high blast radius can make rolling risky for some systems. Common wrong answer to avoid: "Because rolling is outdated" - rolling is excellent in the right compatibility and risk context.
Q: When is blue-green especially attractive? A: When you want a fast environment-level switch and can afford duplicate capacity plus compatible data handling. Common wrong answer to avoid: "Always for zero downtime" - it is powerful, but cost and data complexity can outweigh the benefits.
Q: Why is canary release more than just sending less traffic? A: Because the goal is evidence-based widening, using version comparisons and staged promotion decisions. Common wrong answer to avoid: "Because small traffic means no risk" - even one percent can hurt critical users if you ignore segmentation.
Q: Why can shadow traffic still create incidents? A: If side effects are not isolated, the shadow path can duplicate writes, notifications, or external calls. Common wrong answer to avoid: "Because shadow traffic is fake traffic" - it uses real requests, so real consequences can leak out.
Apply now (5 min)¶
Pick one service you own. Choose rolling, blue-green, canary, or shadow as the primary release method and justify it in three bullets. Then list the exact metrics that would stop promotion within ten minutes. If rollback is unclear, improve release design before the next launch.
Bridge. Deployments safe. But what if the whole region goes down? → 10