03. Week 15 — Capstone Project Study Material¶
How to use this file¶
Use this document with the narrative in 02_explainer.md. The explainer gives the mental picture. This file gives the operational checklist, matrices, and references.
Cross-reference map¶
| Need | Start here | Then go here |
|---|---|---|
| Understand why capstones fail | 02_explainer.md, Chapter 1 | Section 2 below |
| Choose architecture | 02_explainer.md, Chapter 2 | Sections 3-4 below |
| Plan implementation | 02_explainer.md, Chapter 3 | Section 5 below |
| Build evals and monitoring | 02_explainer.md, Chapter 4 | Sections 6-7 below |
| Package and present the work | 02_explainer.md, Chapter 5 | Sections 8-9 below |
| Self-test at the end | 04_daily_recall.md | 06_revision.md |
Section 1 — What a strong capstone demonstrates¶
A strong capstone is not a bigger hands_on_lab. It is a proof of judgment. It shows you can balance user value, engineering complexity, quality, latency, and cost.
A hiring manager should be able to point at your project and say: - This person can ship an AI feature. - This person can explain trade-offs. - This person knows what to instrument. - This person knows where the risks are.
Section 2 — Capstone idea filter¶
| Filter | Green light | Red flag |
|---|---|---|
| User | Specific persona with a real workflow | "Anyone who uses AI" |
| Scope | One narrow job-to-be-done | Full product platform |
| Data | Reachable data source or realistic mock | Undefined future dataset |
| Evaluation | Can create a gold set in days | Needs months of labeling |
| Demo | Easy to explain in two minutes | Needs long setup and context |
| Portfolio value | Maps to target companies | Interesting but irrelevant |
Section 3 — Architecture choice matrix¶
| Pattern | When to choose it | Benefits | Risks |
|---|---|---|---|
| Single request pipeline | One user action, low branching | Simple, debuggable, fast MVP | Can become rigid |
| RAG pipeline | User needs grounded answers | Strong factuality, inspectable context | Retrieval quality becomes the bottleneck |
| Tool-using agent | User action needs external systems | Can act, not just answer | Higher latency, more failure modes |
| Event-driven async stage | Slow background enrichment | Keeps user flow responsive | Harder observability |
| Multi-agent split | Different specialists are genuinely needed | Clear division of labor | Coordination overhead explodes |
Default rule: start with the simplest pipeline that can satisfy the user story. Add sophistication only after measuring need.
Section 4 — Contracts between components¶
Write contracts before wiring components together. If the contracts are vague, integration pain is guaranteed.
Minimum contracts to define: 1. User request contract — input fields, auth context, session identifiers. 2. Retrieval contract — query, filters, top-k, returned chunk schema. 3. Tool contract — arguments, timeout, retry policy, safe fallback. 4. Response contract — answer, citations, confidence, refusal reason. 5. Telemetry contract — latency, tokens, cost, error code, trace id.
Suggested request envelope:
{
"request_id": "uuid",
"user_id": "string",
"task_type": "ask|act|summarize",
"input": "user message",
"context": {
"session_id": "string",
"locale": "en-IN"
}
}
Section 5 — Implementation sequencing¶
Phase 1: prove the user path¶
- Hard-code weak points if needed.
- Use the best available model first.
- Get a visible output quickly.
Phase 2: replace critical stubs with real components¶
- Swap mock retrieval for real retrieval.
- Add tool safety checks.
- Move prompts into versioned files.
Phase 3: add inspection¶
- Replay tests.
- Gold queries.
- Latency logging.
- Cost logging.
Phase 4: package the system¶
- Containerize.
- Add startup scripts.
- Write README and architecture notes.
- Record the demo.
Section 6 — System-level evaluation¶
Component evals are not enough. A capstone fails at handoffs. Measure the full chain.
| Eval type | What it catches | Example |
|---|---|---|
| End-to-end gold set | Broken handoffs, wrong final answer | Retrieval okay, answer still wrong |
| Latency budget test | Slow composite workflows | Tool retry makes response unusable |
| Cost test | Hidden expensive paths | Agent loop burns tokens |
| Failure injection | Missing fallbacks | Retriever outage causes crash |
| Human review sample | User trust issues | Tone or action confidence is wrong |
Section 7 — Metrics to track from week one¶
| Area | Metric | Why it matters |
|---|---|---|
| Quality | task success rate | Tells you if the system solves the job |
| Quality | citation faithfulness | Important for grounded systems |
| Reliability | error rate | Users feel this immediately |
| Reliability | fallback rate | Reveals brittle dependencies |
| Latency | p50 / p95 end-to-end | One slow hop ruins the experience |
| Cost | dollars per successful task | Honest portfolio metric |
| Operations | tokens per request | Explains cost swings |
Section 8 — Deployment basics¶
You do not need perfect infrastructure this week. You do need a credible path.
Minimal deployment story: - Application packaged with a reproducible environment. - Config separated from code. - Secrets stored outside the repo. - One command to run locally. - One command or workflow to deploy. - Health check route and basic logs.
Section 9 — Demo and portfolio packaging¶
The demo is part of the engineering work. If users cannot understand the value quickly, the project underperforms.
Use this order in the demo: 1. Problem statement. 2. Input from the user. 3. Visible system action. 4. Result with evidence. 5. One failure mode and your mitigation. 6. One number on quality. 7. One number on latency or cost.
Section 10 — Reference material¶
YouTube¶
- Building and Evaluating AI Agents - Useful for planning system-level evals and failure analysis.
- How I use LLMs - Practical workflow for fast iteration, documentation, and shipping.
Blogs¶
- Emerging Architectures for LLM Applications - Good reference stack for choosing capstone architecture.
- Building LLM Applications for Production - Clear explanation of latency, evaluation, and cost trade-offs.
- Rules of Machine Learning - Durable heuristics for sequencing work and avoiding premature complexity.
Section 11 — What Module 16 will assume¶
Module 16 assumes you have already felt the pain of: - Integration challenges. - Cost and latency trade-offs. - Deployment basics. - Explaining system decisions to other engineers.
That is why this module matters. Next week turns your lived decisions into reusable principles.