04. Model registry and versioning¶

⏱️ Estimated time: 20 min | Level: intermediate

ELI5 callback: In our chain, the kitchen trains, the prep station prepares, the recipe book stores, the serving counter serves, and the quality inspector checks. Same restaurant chain, different platform layer. See.

A registry turns loose artifacts into governed assets¶

A model file alone tells almost nothing useful. You need version, lineage, owner, task, and approval state. That bundle is what a registry provides. See. The recipe book is not storage alone; it is trusted memory. The kitchen publishes trained artifacts into this memory. The prep station should link the feature definitions that shaped the run. The serving counter later reads approved versions from here. The quality inspector uses lineage to explain regressions fast. So what to do? Define the registry as part of release flow. Do not leave it as an optional afterthought. Without a registry, teams pass model files like mystery attachments. That kills rollback confidence. Simple, no? Promotion needs a source of truth.

train run → artifact
    │         │
    ├── metrics
    ├── lineage
    ├── owner
    └── registry entry
            ↓

Store models with context, not as lonely binaries.
Make registration automatic after validated runs.
Keep ownership visible beside every version.
Registry entries should outlive individual team members.
Source of truth beats tribal memory every time.
Now watch. Versioning is more nuanced than v1, v2, v3.

Versioning must cover code, data, and contract¶

A model version is more than weights or a pickle file. It also implies input schema, output schema, and runtime expectations. If any of those shift, compatibility may break. See. Semantic versioning ideas still help here. Major changes break consumers or invalidate comparison. Minor changes preserve contract but improve behavior. Patch changes fix packaging or metadata without changing predictions. So what to do? Define versioning rules in plain language. Tie each version to code commit, data snapshot, and feature view set. That linkage matters during audits and rollbacks. Also record environment images and dependency hashes. Tiny library changes can alter outputs unexpectedly. Now watch. Version naming is really contract naming.

model version
   ├── weights
   ├── schema
   ├── code commit
   ├── data snapshot
   └── runtime image
          ↓

Version the full reproducible unit, not only weights.
State compatibility rules before incidents force them.
Preserve old contracts until consumers migrate.
Attach dependency information for exact reruns.
Ambiguous versioning multiplies rollback pain.
Simple, no? Version names should answer practical questions.

Metadata and lineage make the registry explainable¶

Metadata is the registry feature engineers notice during trouble. Store task name, owner, training window, labels, and known caveats. Store metrics by slice, not only global averages. Lineage links model versions back to experiments and datasets. See. Without lineage, root-cause analysis becomes archaeology. Approval comments also belong in metadata. Why was this model promoted? Who signed off? That question appears every time performance changes suddenly. So what to do? Make metadata fields structured and queryable. Free-text notes help, but searchable fields help more. Use tags for task, region, risk tier, and serving mode. Then dashboards and policy engines can act automatically. Now the registry starts behaving like an operational system.

registry entry
   │
   ├── metrics by slice
   ├── dataset / run links
   ├── approvals / comments
   └── risk / owner tags
            ↓

Structured metadata beats clever manual naming.
Lineage should support both humans and automation.
Capture caveats before production forgets them.
Approval trails matter for trust and audits.
The registry should answer who, what, when, and why.
See. Searchability is a feature, not a bonus.

Approval gates protect production from eager promotion¶

Not every trained model deserves traffic. Approval gates force evidence before release. Typical gates include metric thresholds, fairness checks, and runtime validation. Some teams add security scanning and PII review too. See. Gates should match risk tier, not bureaucracy taste. A low-risk ranking tweak and a credit decision model are different. So what to do? Define stage transitions clearly. Common stages are staging, candidate, approved, deployed, and archived. Automate transitions where evidence is objective. Keep human review where business or regulatory judgment matters. Also require serving smoke tests before final approval. A model can be accurate and still fail at runtime. Now watch. Good gates shorten incidents more than they slow launches.

staging → candidate → approved → deployed
   │         │          │         │
   ├─ tests   ├─ review  ├─ smoke  ├─ traffic
   │         │          │         │
   └─────────┴──────────┴─────────┘
               ↓
            rollback

Approval stages should be visible and boring.
Match evidence depth to model risk.
Automate objective checks aggressively.
Keep final approval auditable and reversible.
Registry gates should protect speed, not sabotage it.
Simple, no? Release confidence comes from clear criteria.

Rollback works only when the previous good state is clear¶

Rollback sounds easy until dependencies changed around the model. Maybe the feature schema changed or the server image changed. Maybe business thresholds changed with the rollout. See. True rollback means restoring a known-good bundle. That bundle includes model, config, schema, and serving contract. So what to do? Mark champion versions explicitly in the registry. Keep warm deployment paths for recent stable versions. Test rollback regularly during non-emergency hours. Also record reason codes when rollbacks happen. Those notes improve future approval policy. Archive old versions carefully rather than deleting them blindly. Storage is usually cheaper than confusion. Now watch. Governance becomes resilience when incidents hit.

issue detected
    │
    ├── find champion version
    ├── restore config bundle
    ├── shift traffic back
    └── annotate incident
            ↓

Rollback needs ready artifacts, not hopeful promises.
Keep one blessed baseline per production endpoint.
Test rollback like any other release path.
Preserve incident reasons as registry metadata.
Archival policy should protect future traceability.
See. Old models become valuable exactly on bad days.

Where this lives in the wild¶

A regulated lending team stores approval comments because model release decisions are auditable events.
A recommendation platform tags champion and challenger versions for quick rollback during regressions.
A fraud team links registry entries to feature sets and experiment runs for incident analysis.
A platform group uses MLflow stages to automate promotion for low-risk internal models.
A search team preserves old runtimes because output changes can come from dependency drift too.

Pause and recall¶

Why is a registry more than just model artifact storage?
What should a model version include beyond weights?
Why do structured metadata and lineage matter during incidents?
What makes rollback harder than simply loading the previous file?

Interview Q&A¶

Q: What does a model registry actually solve? A: It centralizes versions, metadata, lineage, approval state, and deployment readiness so promotion and rollback become trustworthy. Common wrong answer to avoid: It gives data scientists a nicer folder structure.

Q: What should be versioned with a model? A: Weights, input-output contract, code commit, data snapshot, feature definitions, and runtime environment should all be linked. Common wrong answer to avoid: Only the model binary matters because everything else is implied.

Q: Why add approval gates to the registry? A: Because release quality includes metrics, risk checks, runtime validation, and human accountability, not just training completion. Common wrong answer to avoid: Because management likes extra buttons.

Q: How do you make rollback reliable? A: Keep champion versions, full lineage, compatible runtime bundles, and tested traffic-shift procedures ready ahead of incidents. Common wrong answer to avoid: Just press restore previous version and trust the universe.

Apply now (5 min)¶

Take one production model and write the exact fields you would store in its registry entry. Add version, owner, feature set, training data window, metrics, approval state, and rollback target. Then ask which of those fields your current process cannot answer quickly. That missing field is your first registry improvement. Write it down and assign an owner today.

Bridge. Models registered. Now serve them to customers — fast. → 05