01. Week 17 — MLOps & Production¶
Key concepts to master¶
- Experiment tracking and run lineage.
- Model registry stages and approval evidence.
- Reproducibility across code, data, environment, and evaluation.
- Artifact storage and immutable versioning.
- Training pipelines and eval gates.
- Automated retraining: safe use and failure modes.
- Feature stores and train-serve skew.
- Data versioning with DVC.
- Serving stacks: vLLM, TGI, Triton, managed endpoints.
- Autoscaling, batching, caching, and GPU scheduling.
- Drift types: data, concept, model, vendor.
- Rollout strategies: shadow, canary, blue-green, percentage rollout.
- Incident response and rollback targets.
- Cost optimization through routing, caching, batching, and right-sizing.
🧠 Mental models¶
- Experiment tracking: "a lab notebook with the exact ingredients and timestamps"
- Model registry: "passport control for promoting models between environments"
- Feature store: "a shared pantry so training and serving eat the same ingredients"
- Drift monitoring: "a smoke alarm for a changing world or changing data"
- Rollouts: "test pilots before you hand the whole fleet to a new model"
- Serving stack: "a kitchen balancing queues, burners, and prep stations under load"
⚠️ Common traps¶
- Recording a model version without the code commit, prompt/config, data snapshot, and eval context needed to reproduce it.
- Promoting models without approval evidence, rollback targets, or clear ownership.
- Confusing data drift, concept drift, and vendor/model behavior drift during incidents.
- Ignoring train-serve skew until online metrics collapse after launch.
- Autoscaling on raw QPS when sequence length and token throughput actually drive GPU pressure.
- Automating retraining without human gates for label quality, regression checks, or business review.
🔗 Prerequisites & connections¶
Builds on: Module 16 engineering discipline around reproducibility, decision records, testing layers, and versioned change management.
Feeds into: Module 18 voice and realtime systems, where serving, monitoring, rollback, and latency discipline must operate under much tighter SLAs.
💬 Interview phrasing¶
- What has to be captured so you can reproduce an ML run six months later?
- Why is a model registry more than a folder full of artifacts?
- How would you detect train-serve skew or drift before users notice?
- When would you choose shadow deployment, canary rollout, or blue-green release?
- In an AI incident, what can you actually roll back?
⏱️ Difficulty markers¶
- 🟢 experiment tracking basics
- 🟢 model registry stages
- 🟡 artifact and data versioning
- 🟡 feature stores and train-serve skew
- 🔴 serving-stack capacity tuning
- 🔴 drift taxonomy and incident response
- 🔴 safe automated retraining
Self-check questions¶
- Why did the opening failure remain invisible for weeks? See
02_explainer.md§1.3-§1.4. - What must every tracked run contain? See
02_explainer.md§2.2-§2.3. - Why is a model registry more than a folder? See
02_explainer.md§2.5-§2.6. - What makes a reproducible ML system different from plain Git history? See
02_explainer.md§2.7-§2.10. - What exactly is the quality gate? See
02_explainer.md§3.5-§3.6. - When is automated retraining wise, and when is it reckless? See
02_explainer.md§3.7. - When would you choose vLLM over TGI or Triton? See
02_explainer.md§4.3. - Why is token-level work often better than QPS for autoscaling? See
02_explainer.md§4.4-§4.5. - What is the difference between data drift and model drift? See
02_explainer.md§5.3-§5.5. - What exactly can you roll back in an AI system? See
02_explainer.md§5.8-§5.10.
Health check¶
By the end of Week 17, you should be able to say all of this honestly:
- [ ] I can explain the factory analogy without notes.
- [ ] I can describe a run-tracking + registry workflow clearly.
- [ ] I can compare vLLM, TGI, and Triton in interview language.
- [ ] I can define drift, rollback, and incident response precisely.
- [ ] I have completed the hands_on_lab in 05_hands_on_lab.md.
- [ ] I feel ready for the latency-heavy world of ../00_realtime_voice_agents/.