00. MLOps & Production — The Five-Year-Old Version¶
Module 16 built the compass. This module builds the machinery.
In the R&D lab, you make one shiny prototype. It works once, on a clean table, with patient people. Everyone smiles. Look, that is useful, but it is not enough.
A real product lives on the factory floor. There, inputs arrive dirty, late, or half-missing. Machines fail. Customers will not wait politely. See, the job is no longer making one good answer.
Now the job is making good answers every day. The answer must stay good on Monday, Friday, and peak rush. If one machine breaks, work should continue safely. If data changes slowly, someone must notice quickly. Simple, no?
So what to do? We build a factory, not a magic demo. We need steps that repeat. We need records that survive memory loss. We need gates that stop bad output. We need eyes on the live floor. Yes?
Picture first¶
Think like this.
┌──────────────┐ │ R&D lab │ │ works once │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Factory line │ │ works daily │ └──────┬───────┘ │ ├── noisy inputs ├── broken machines ├── impatient customers └── changing reality
The model is only one machine in that factory. The surrounding system decides whether value reaches users. That surrounding system is MLOps and production practice.
What changes now¶
In a notebook, you ask, could this work at all? In production, you ask, can this keep working safely?
In a notebook, memory lives in your head. In production, memory must live in tools and process.
In a notebook, failure is visible to one builder. In production, failure may hide from the whole team.
In a notebook, speed of learning matters most. In production, repeatability, monitoring, and rollback matter too.
So this module teaches the factory habits. Each later file maps one habit to one common pain.
Placeholder map¶
| Placeholder | What it means on the factory floor |
|---|---|
| the assembly line | CI/CD pipeline that trains, tests, packages, and promotes models |
| the quality gate | Automated evaluation before anything moves forward |
| the warehouse | Model registry holding approved, versioned, deployable models |
| the production monitor | Observability, dashboards, alerts, and drift checks in live traffic |
| the upgrade without downtime | Blue-green or canary release that changes models safely |
Keep these five names in your head. We will call them back again and again.
What you should notice¶
A model can be smart and still be operationally useless. A team can be talented and still forget the exact winning run. A dashboard without alerting is not the production monitor. A folder full of model files is not the warehouse. A manual checklist is not the quality gate. A one-click deploy without rollback is not the upgrade without downtime. See, names are small, but habits are the point. So what to do? Learn the habits, not only the tools.
Top resources¶
-
MLflow docs Good for run tracking, registry, and practical workflow design.
-
Weights & Biases docs Strong for experiments, dashboards, collaboration, and model reporting.
-
vLLM docs Useful when you study serving systems for large language models.
-
Evidently AI docs Clear material for monitoring quality, drift, and production checks.
-
DVC docs Helpful for dataset versioning, pipelines, and artifact discipline.
-
Google MLOps whitepaper A strong high-level view of automation, governance, and team flow.
What's coming¶
-
01-opening-failure.md The notebook that worked once.
-
02-experiment-tracking.md Memory for your training runs.
-
03-model-registry.md The warehouse that holds approved models.
-
04-reproducibility-lineage.md Full chain of evidence.
-
05-cicd-for-ml.md The assembly line for models.
-
06-quality-gates.md Stopping bad models automatically.
-
07-feature-stores.md Same features in train and serve.
-
08-serving-infrastructure.md Model servers, batching, caching.
-
09-deployment-strategies.md The upgrade without downtime.
-
10-monitoring-drift.md Watching the factory floor.
-
11-incident-response.md Runbooks and rollback.
-
12-cost-optimization-serving.md GPU economics and architecture.
-
13-honest-admission.md What MLOps tooling cannot yet solve.
If Module 16 taught engineering judgment, this module teaches operational discipline. One shows what good decisions look like. The other shows how good decisions survive contact with reality.
Bridge. First, see the failure that creates the need for all this machinery. → 01-opening-failure.md