Skip to content

00. MLOps & Production — The Five-Year-Old Version

Module 16 built the compass. This module builds the machinery.

In the R&D lab, you make one shiny prototype. It works once, on a clean table, with patient people. Everyone smiles. Look, that is useful, but it is not enough.

A real product lives on the factory floor. There, inputs arrive dirty, late, or half-missing. Machines fail. Customers will not wait politely. See, the job is no longer making one good answer.

Now the job is making good answers every day. The answer must stay good on Monday, Friday, and peak rush. If one machine breaks, work should continue safely. If data changes slowly, someone must notice quickly. Simple, no?

So what to do? We build a factory, not a magic demo. We need steps that repeat. We need records that survive memory loss. We need gates that stop bad output. We need eyes on the live floor. Yes?

Picture first

Think like this.

┌──────────────┐ │ R&D lab │ │ works once │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Factory line │ │ works daily │ └──────┬───────┘ │ ├── noisy inputs ├── broken machines ├── impatient customers └── changing reality

The model is only one machine in that factory. The surrounding system decides whether value reaches users. That surrounding system is MLOps and production practice.

What changes now

In a notebook, you ask, could this work at all? In production, you ask, can this keep working safely?

In a notebook, memory lives in your head. In production, memory must live in tools and process.

In a notebook, failure is visible to one builder. In production, failure may hide from the whole team.

In a notebook, speed of learning matters most. In production, repeatability, monitoring, and rollback matter too.

So this module teaches the factory habits. Each later file maps one habit to one common pain.

Placeholder map

Placeholder What it means on the factory floor
the assembly line CI/CD pipeline that trains, tests, packages, and promotes models
the quality gate Automated evaluation before anything moves forward
the warehouse Model registry holding approved, versioned, deployable models
the production monitor Observability, dashboards, alerts, and drift checks in live traffic
the upgrade without downtime Blue-green or canary release that changes models safely

Keep these five names in your head. We will call them back again and again.

What you should notice

A model can be smart and still be operationally useless. A team can be talented and still forget the exact winning run. A dashboard without alerting is not the production monitor. A folder full of model files is not the warehouse. A manual checklist is not the quality gate. A one-click deploy without rollback is not the upgrade without downtime. See, names are small, but habits are the point. So what to do? Learn the habits, not only the tools.

Top resources

  • MLflow docs Good for run tracking, registry, and practical workflow design.

  • Weights & Biases docs Strong for experiments, dashboards, collaboration, and model reporting.

  • vLLM docs Useful when you study serving systems for large language models.

  • Evidently AI docs Clear material for monitoring quality, drift, and production checks.

  • DVC docs Helpful for dataset versioning, pipelines, and artifact discipline.

  • Google MLOps whitepaper A strong high-level view of automation, governance, and team flow.

What's coming

  1. 01-opening-failure.md The notebook that worked once.

  2. 02-experiment-tracking.md Memory for your training runs.

  3. 03-model-registry.md The warehouse that holds approved models.

  4. 04-reproducibility-lineage.md Full chain of evidence.

  5. 05-cicd-for-ml.md The assembly line for models.

  6. 06-quality-gates.md Stopping bad models automatically.

  7. 07-feature-stores.md Same features in train and serve.

  8. 08-serving-infrastructure.md Model servers, batching, caching.

  9. 09-deployment-strategies.md The upgrade without downtime.

  10. 10-monitoring-drift.md Watching the factory floor.

  11. 11-incident-response.md Runbooks and rollback.

  12. 12-cost-optimization-serving.md GPU economics and architecture.

  13. 13-honest-admission.md What MLOps tooling cannot yet solve.

If Module 16 taught engineering judgment, this module teaches operational discipline. One shows what good decisions look like. The other shows how good decisions survive contact with reality.

Bridge. First, see the failure that creates the need for all this machinery. → 01-opening-failure.md