03. Week 16 — Engineering Principles¶
Companion to 02_explainer.md. Read this after the narrative if you want the compressed version.
1. The Lead AI Engineer lens¶
A senior engineer solves difficult local problems. A Lead AI Engineer improves the decision system around those problems.
Use this captain vocabulary throughout the week:
| Term | Meaning |
|---|---|
| the course | Technical direction |
| the compass | Decision framework |
| the crew | Team and stakeholders |
| the weather check | Risk assessment |
| the ship's log | Documentation, ADRs, runbooks |
2. Decision framework basics¶
Start with the decision question, not the tool. Use these filters in order:
- What job must the system do?
- What constraints are non-negotiable?
- What is the cheapest reversible path?
- What evidence would make us change later?
- What new operational burden comes with this choice?
Build vs buy vs fine-tune¶
| Option | Best when | Main downside | Default posture |
|---|---|---|---|
| Buy API | Uncertain use case, need speed | Less control, vendor dependence | Start here |
| Prompt + workflow | Model is capable, task framing is weak | Can grow messy without discipline | Usually next |
| RAG | Domain knowledge is missing | Retrieval quality becomes critical | Before fine-tune |
| Fine-tune | Stable pattern, strong labeled data, repeated volume | Slow, expensive, extra ops | Later, with evidence |
| Self-host | Privacy, cost curve, or platform leverage matter | Highest ops burden | Only when justified |
Reversibility¶
| Decision type | Example | Process level |
|---|---|---|
| Two-way door | Prompt wording, small routing tweak | Decide fast, monitor |
| One-way-ish door | Vendor contract, retention policy | ADR + review |
| One-way door | Self-hosting stack, org-wide platform commitment | RFC + ADR + staged rollout |
3. Decision records and documentation habits¶
Use ADRs for high-coupling choices. Keep them short and useful.
Minimum ADR template - Title - Status - Context - Options considered - Decision - Consequences - Trigger to revisit
Minimum system docs
| Document | Why it exists |
|---|---|
| README | Understand the system fast |
| ADR | Preserve technical reasoning |
| Eval spec | Show how quality is measured |
| Runbook | Make incidents survivable |
| Prompt/model card | Track versions, limits, risks |
4. Code and system quality for AI¶
AI quality has three layers.
| Layer | What it answers | Examples |
|---|---|---|
| Unit tests | Does deterministic glue behave correctly? | Prompt builder, parser, tool router, fallback logic |
| Evals | Is model behavior good enough for the task? | Accuracy, faithfulness, safety, usefulness |
| Monitoring | Is the live system healthy right now? | Latency, cost, errors, drift, user feedback |
What to unit test¶
- Prompt templates and required variables.
- Retrieval filters and tenant isolation.
- Structured output parsing and schema validation.
- Tool permission checks.
- Timeout and fallback logic.
- Cost accounting functions.
What to evaluate¶
- Task accuracy.
- Faithfulness to retrieved context.
- Safety and policy compliance.
- Refusal correctness.
- Latency-quality tradeoff.
- Cost-quality tradeoff.
Observability baseline¶
Log, at minimum: - Request and trace IDs. - Model version. - Prompt or workflow version. - Retrieval sources. - Latency buckets. - Token usage and cost. - Error class. - Online quality signal if available.
Error budgets¶
Error budgets turn “we should improve reliability” into a forcing function. If the budget burns too quickly, risky launches pause. That principle matters even before full MLOps.
5. Technical debt in AI systems¶
| Debt type | Example | Cost later |
|---|---|---|
| Prompt debt | Giant patched prompt | Hidden regressions |
| Eval debt | Tiny stale benchmark | False confidence |
| Data debt | Weak labels, unclear provenance | Misleading fine-tuning |
| Workflow debt | Agent loop without stop rules | Cost spikes and unsafe behavior |
| Ops debt | No rollback or emergency disable | Long incidents |
| Knowledge debt | Context trapped in one engineer | Slow onboarding |
6. Team and process¶
Code review for AI code¶
Review behavior, not only syntax. Ask: - What changed in prompts, retrieval, tools, or models? - Which evals cover this change? - What is the fallback path? - What new risks appear? - What is the latency or cost impact?
Sprint planning for research-heavy work¶
Plan around questions and exit criteria. Examples: - “Does retrieval improve answer quality by 10 points?” - “Can we keep p95 latency under 3 seconds with citations enabled?”
Do not plan research like deterministic CRUD work. Also do not hide behind ambiguity forever.
When to automate vs manual¶
| Signal | Manual-first | Automate sooner |
|---|---|---|
| Volume | Low | High |
| Risk | High, unclear edge cases | Lower, well-understood |
| Observability | Weak | Strong |
| Response-time pressure | Low | High |
| Task stability | Unstable | Stable and repeated |
7. Communication and influence¶
Translate the same technical decision differently for each audience.
| Audience | Emphasis |
|---|---|
| Product | User value, scope, confidence band |
| Legal/compliance | Data handling, controls, auditability |
| Finance | Cost curve, break-even, budget risk |
| Executives | Strategic leverage and downside |
| Ops/support | Failure handling and ownership |
RFC skeleton¶
- Problem.
- Constraints.
- Options.
- Recommendation.
- Tradeoffs.
- Risks.
- Rollout and revisit plan.
8. Foundation-gap audit for Module 17¶
Module 17 assumes you already know:
- Decision framework basics — you can defend a technical choice with criteria.
- When to automate vs manual — you can stage automation instead of forcing it.
- Risk assessment — you think in blast radius, rollback, and user harm.
- Documentation habits — you keep ADRs, eval notes, and runbooks current enough.
If these are missing, MLOps feels like tooling trivia. If these are solid, MLOps feels like principled infrastructure.
9. Bridge forward¶
Next module — 04_ml_platform_operations — operationalizes these principles into concrete infrastructure: CI/CD for ML, model registries, monitoring, and the platform that makes good engineering automatic.
10. Study order¶
- Read 02_explainer.md for intuition.
- Revisit this file for compression.
- Use 04_daily_recall.md daily.
- Finish 05_hands_on_lab.md.
- Close with 06_revision.md.