03. Week 15 — Capstone Project Study Material¶

How to use this file¶

Use this document with the narrative in 02_explainer.md. The explainer gives the mental picture. This file gives the operational checklist, matrices, and references.

Cross-reference map¶

Need	Start here	Then go here
Understand why capstones fail	02_explainer.md, Chapter 1	Section 2 below
Choose architecture	02_explainer.md, Chapter 2	Sections 3-4 below
Plan implementation	02_explainer.md, Chapter 3	Section 5 below
Build evals and monitoring	02_explainer.md, Chapter 4	Sections 6-7 below
Package and present the work	02_explainer.md, Chapter 5	Sections 8-9 below
Self-test at the end	04_daily_recall.md	06_revision.md

Section 1 — What a strong capstone demonstrates¶

A strong capstone is not a bigger hands_on_lab. It is a proof of judgment. It shows you can balance user value, engineering complexity, quality, latency, and cost.

A hiring manager should be able to point at your project and say: - This person can ship an AI feature. - This person can explain trade-offs. - This person knows what to instrument. - This person knows where the risks are.

Section 2 — Capstone idea filter¶

Filter	Green light	Red flag
User	Specific persona with a real workflow	"Anyone who uses AI"
Scope	One narrow job-to-be-done	Full product platform
Data	Reachable data source or realistic mock	Undefined future dataset
Evaluation	Can create a gold set in days	Needs months of labeling
Demo	Easy to explain in two minutes	Needs long setup and context
Portfolio value	Maps to target companies	Interesting but irrelevant

Section 3 — Architecture choice matrix¶

Pattern	When to choose it	Benefits	Risks
Single request pipeline	One user action, low branching	Simple, debuggable, fast MVP	Can become rigid
RAG pipeline	User needs grounded answers	Strong factuality, inspectable context	Retrieval quality becomes the bottleneck
Tool-using agent	User action needs external systems	Can act, not just answer	Higher latency, more failure modes
Event-driven async stage	Slow background enrichment	Keeps user flow responsive	Harder observability
Multi-agent split	Different specialists are genuinely needed	Clear division of labor	Coordination overhead explodes

Default rule: start with the simplest pipeline that can satisfy the user story. Add sophistication only after measuring need.

Section 4 — Contracts between components¶

Write contracts before wiring components together. If the contracts are vague, integration pain is guaranteed.

Minimum contracts to define: 1. User request contract — input fields, auth context, session identifiers. 2. Retrieval contract — query, filters, top-k, returned chunk schema. 3. Tool contract — arguments, timeout, retry policy, safe fallback. 4. Response contract — answer, citations, confidence, refusal reason. 5. Telemetry contract — latency, tokens, cost, error code, trace id.

Suggested request envelope:

{
  "request_id": "uuid",
  "user_id": "string",
  "task_type": "ask|act|summarize",
  "input": "user message",
  "context": {
    "session_id": "string",
    "locale": "en-IN"
  }
}

Section 5 — Implementation sequencing¶

Phase 1: prove the user path¶

Hard-code weak points if needed.
Use the best available model first.
Get a visible output quickly.

Phase 2: replace critical stubs with real components¶

Swap mock retrieval for real retrieval.
Add tool safety checks.
Move prompts into versioned files.

Phase 3: add inspection¶

Replay tests.
Gold queries.
Latency logging.
Cost logging.

Phase 4: package the system¶

Containerize.
Add startup scripts.
Write README and architecture notes.
Record the demo.

Section 6 — System-level evaluation¶

Component evals are not enough. A capstone fails at handoffs. Measure the full chain.

Eval type	What it catches	Example
End-to-end gold set	Broken handoffs, wrong final answer	Retrieval okay, answer still wrong
Latency budget test	Slow composite workflows	Tool retry makes response unusable
Cost test	Hidden expensive paths	Agent loop burns tokens
Failure injection	Missing fallbacks	Retriever outage causes crash
Human review sample	User trust issues	Tone or action confidence is wrong

Section 7 — Metrics to track from week one¶

Area	Metric	Why it matters
Quality	task success rate	Tells you if the system solves the job
Quality	citation faithfulness	Important for grounded systems
Reliability	error rate	Users feel this immediately
Reliability	fallback rate	Reveals brittle dependencies
Latency	p50 / p95 end-to-end	One slow hop ruins the experience
Cost	dollars per successful task	Honest portfolio metric
Operations	tokens per request	Explains cost swings

Section 8 — Deployment basics¶

You do not need perfect infrastructure this week. You do need a credible path.

Minimal deployment story: - Application packaged with a reproducible environment. - Config separated from code. - Secrets stored outside the repo. - One command to run locally. - One command or workflow to deploy. - Health check route and basic logs.

Section 9 — Demo and portfolio packaging¶

The demo is part of the engineering work. If users cannot understand the value quickly, the project underperforms.

Use this order in the demo: 1. Problem statement. 2. Input from the user. 3. Visible system action. 4. Result with evidence. 5. One failure mode and your mitigation. 6. One number on quality. 7. One number on latency or cost.

Section 10 — Reference material¶

YouTube¶

Building and Evaluating AI Agents - Useful for planning system-level evals and failure analysis.
How I use LLMs - Practical workflow for fast iteration, documentation, and shipping.

Blogs¶

Emerging Architectures for LLM Applications - Good reference stack for choosing capstone architecture.
Building LLM Applications for Production - Clear explanation of latency, evaluation, and cost trade-offs.
Rules of Machine Learning - Durable heuristics for sequencing work and avoiding premature complexity.

Section 11 — What Module 16 will assume¶

Module 16 assumes you have already felt the pain of: - Integration challenges. - Cost and latency trade-offs. - Deployment basics. - Explaining system decisions to other engineers.

That is why this module matters. Next week turns your lived decisions into reusable principles.