05. Assignment 8 — Production-Ready ML System¶

Week 17. Build and document a small but credible production-style ML or LLM service with lineage, safe promotion, monitoring, and rollback thinking.

Goal¶

Move from “I trained a model” to “I can operate a model-backed system responsibly.” Your output should show lifecycle management, serving awareness, and operational judgment.

Recommended project shape¶

Choose one of these tracks:

Track A — Open-weight LLM serving¶

Serve a small instruct model with vLLM, TGI, or an equivalent stack.
Expose one simple chat or completion endpoint.
Add request metrics, latency breakdowns, and cost estimates.

Track B — Classical ML service¶

Serve a tabular or retrieval-assisted model behind a small API.
Add run tracking, artifact storage, and a rollback-ready registry story.
Monitor one business-facing quality metric plus system metrics.

Either track is fine. The point is operational shape, not hype.

Required design elements¶

1. Lineage and lifecycle¶

Your project must show: - tracked training runs, - versioned artifacts, - a clear promotion decision, - a rollback-ready previous version.

Cross-ref: review 02_explainer.md §2.1-§2.12.

2. CI/CD thinking¶

Document your release flow: - what triggers retraining or rebuild, - what checks form the quality gate, - what evidence is required before promotion.

You do not need a giant platform. You do need a believable release process.

Cross-ref: review 02_explainer.md §3.1-§3.12.

3. Serving design¶

Explain: - serving stack choice, - latency-sensitive path, - autoscaling or concurrency assumptions, - batching or caching choices, - cost constraints.

Cross-ref: review 02_explainer.md §4.1-§4.12.

4. Monitoring and rollback¶

You must include: - system metrics, - at least one quality metric, - an alert or health-check rule, - a rollback procedure.

Cross-ref: review 02_explainer.md §5.1-§5.10.

Suggested deliverables¶

README.md — architecture diagram, stack choice, results, lessons.
Run tracking evidence — screenshots or logs from MLflow/W&B/custom metadata.
Registry / version evidence — model version table or equivalent.
Service code — API endpoint and deployment instructions.
Monitoring evidence — dashboard screenshot or metrics sample.
ROLLBACK.md or README section — exact rollback steps.
COST.md or README section — rough monthly economics and assumptions.

Evaluation rubric¶

Category	What “good” looks like
Lineage	Clear link from run to artifact to deployed version
Promotion logic	Quality gate is explicit and sensible
Serving judgment	Stack choice and latency/cost tradeoff are explained
Monitoring	Both infra and quality signals are present
Rollback	Fast, concrete, and believable
Write-up honesty	Limits and missing pieces are stated plainly

Common failure modes¶

only showing a notebook,
no evidence of versioning,
pretending “latest model” is a rollback plan,
monitoring only latency,
giving cost numbers without assumptions,
choosing a giant stack you cannot explain.

What to write in the README¶

Answer these directly: 1. What was the champion version before this release? 2. What had to pass before promotion? 3. What would trigger rollback? 4. Where are latency, quality, and cost visible? 5. What would you harden next with one extra week?

Stretch goals¶

add shadow or canary language to the release plan,
compare managed versus self-hosted serving,
add slice-level monitoring,
simulate one drift scenario and show detection.

Submission standard¶

By the end, you should be able to say: “I can walk you through the run lineage, the release gate, the serving stack, the dashboard, and the rollback plan.” If you cannot say that yet, the hands_on_lab is not done.