02. System Design Blueprint — Start With the User, Not the Model¶

~11 min read. The most common capstone mistake: picking a model first and designing a use case around it.

Built on the ELI5 in 00-eli5.md. The blueprint — the system design document capturing what you are building and why — is the first deliverable. Everything else depends on it.

Why user-first thinking changes everything¶

See. Most engineers open a new project by asking: "Which model should I use?" Wrong question. That is like asking "which hammer?" before you know the shape of the nail.

The right first question is: "What job does the user need done?"

Clayton Christensen called this "jobs to be done." The user does not want an LLM. The user wants to finish something faster. They want to find an answer, draft a document, or avoid a manual lookup.

The blueprint forces you to write the user job before touching code. Here is the format we use. One sentence, three parts.

When [situation], the user wants to [action], so they can [outcome].

Example: When a customer service agent gets a complaint email, they want to see the top three policy clauses that apply, so they can respond in under two minutes without reading the whole manual.

This one sentence drives every downstream decision. Architecture, retrieval, prompt design, evaluation — all of it.

The blueprint document: five sections¶

A complete blueprint has exactly five sections. No more. Do not over-document. Do not under-document.

┌─────────────────────────────────────────────────────┐
│  1. User Job Statement (1-2 sentences)              │
├─────────────────────────────────────────────────────┤
│  2. Success Criteria (measurable, not vague)        │
├─────────────────────────────────────────────────────┤
│  3. Data Available (what exists vs. what is needed) │
├─────────────────────────────────────────────────────┤
│  4. Constraints (latency, cost, privacy, accuracy)  │
├─────────────────────────────────────────────────────┤
│  5. Architecture Hypothesis (one paragraph only)    │
└─────────────────────────────────────────────────────┘

Section 5 is a hypothesis, not a decision. You commit to architecture only after section 3 reveals what data you actually have.

Worked example: support assistant blueprint¶

Let us fill in the blueprint for a real scenario.

User Job Statement When a support agent reads a ticket, they want the top three relevant knowledge-base articles ranked by relevance, so they can close tickets 40% faster.

Success Criteria - Precision@3 ≥ 0.75 (three-quarters of suggested articles are actually useful) - Response latency ≤ 800 ms at p95 - Agent satisfaction score ≥ 4.0 / 5.0 in weekly survey

Look. These are measurable. "Helpful" is not measurable. "Precision@3 ≥ 0.75" is.

Data Available - 12 000 knowledge-base articles (HTML, updated weekly) - 6 months of ticket history with agent-selected articles (ground truth!) - No PII in KB; tickets contain customer names (privacy constraint)

Constraints - Latency: 800 ms p95. Tight. Rules out multi-hop agent architectures. - Cost: ≤ $0.002 per ticket lookup. - Privacy: ticket text must not leave our VPC.

Architecture Hypothesis RAG with local embedding model + vector database. No agent loop needed — retrieval + ranking is enough. Fine-tuning not warranted yet — try few-shot first.

Simple, no? Notice how constraints ruled out one architecture before we wrote any code.

From blueprint to constraint-driven design¶

Constraints are the most useful part of the blueprint. They eliminate bad architectures immediately.

Here is a decision matrix based on common constraint combinations.

Latency < 500ms  +  cost-sensitive  →  No agent loop, cache embeddings, small model
Latency < 2s     +  high accuracy   →  RAG + reranker, medium model, retrieval evals
Latency < 10s    +  complex tasks   →  Agent with tool calls, larger model
No latency SLA   +  rare queries    →  Async pipeline, batch embeddings, fine-tuning OK

The user job statement tells you the right row. Pick the row, then design within it.

The foundation — infrastructure choices — are also constrained here. A privacy constraint ("must not leave VPC") eliminates all external API calls. A cost constraint limits model size. Write both in the blueprint. Then you will not backtrack later.

What bad blueprints look like¶

Bad blueprint section 1: "We will build an AI assistant that helps users." Problem: no situation, no action, no outcome. Not actionable.

Bad blueprint section 2: "The system will be accurate and fast." Problem: neither is measurable. You cannot know when you have met this goal.

Bad blueprint section 4: "We will use GPT-4." Problem: that is an architecture choice, not a constraint. It goes in section 5.

See. If you cannot measure it, it is not a success criterion. If you are committing to technology before knowing the constraints, you are guessing.

Where this lives in the wild¶

Notion AI — product requirement doc drives the "what should AI autocomplete do" decision; measurable criteria (acceptance rate) gate model upgrades.
Linear AI — the job is "suggest issue priority"; the blueprint constrained latency to < 200 ms, ruling out large models immediately.
Stripe Radar ML — fraud detection blueprint started with precision/recall trade-off, not model selection.
Duolingo Max — user job ("explain why my answer was wrong") drove the choice of conversational architecture over simple feedback labels.
Glean — enterprise search blueprint identified "must not index PII" as a hard constraint; drove on-premise embedding model choice.

Pause and recall¶

What is the "user job statement" formula? Write it from memory.
Name the five sections of a complete blueprint document.
Why are constraints in section 4 evaluated before the architecture hypothesis in section 5?
In the support assistant example, which constraint ruled out a multi-hop agent?

Interview Q&A¶

Q: "Walk me through how you would start designing an AI system for a new use case."

A: I start with the user job statement — situation, action, outcome. Then I write measurable success criteria. Then I audit available data. Then I list hard constraints — latency, cost, privacy. Only then do I write an architecture hypothesis. Architecture last, not first.

Common wrong answer to avoid: "I would start by choosing a model and then build a prompt." Model-first design produces systems that solve the wrong problem confidently.

Q: "How do you turn a vague product requirement into a testable AI spec?"

A: I push back on vague words. "Accurate" becomes "precision@3 ≥ 0.75 on our held-out eval set." "Fast" becomes "p95 latency ≤ 800 ms." I write these before writing code. If the team cannot agree on a number, the feature is not ready to build.

Common wrong answer to avoid: "I define success after I see the demo." Post-hoc success criteria are always met because you move the goalposts.

Q: "What is the biggest mistake teams make when starting a new AI feature?"

A: Picking the model before defining the problem. You end up building a solution looking for a problem. The right order is: user job → success criteria → constraints → architecture → model.

Common wrong answer to avoid: "Using a model that is too small." Model size is almost never the first mistake. Problem definition is.

Q: "How do constraints shape architecture choices in practice?"

A: Hard constraints eliminate architectures. A 500 ms latency SLA eliminates multi-step agent loops. A privacy constraint that says no data leaves the VPC eliminates external API models. I treat constraints as the first filter, not the last consideration.

Common wrong answer to avoid: "I try all architectures and pick the fastest one." That wastes weeks of engineering time.

Apply now (5 min)¶

Choose a product you use daily that has an AI feature. Write a blueprint for that feature: user job, success criteria, constraints, and architecture hypothesis. Be specific — use real numbers for latency and accuracy thresholds. Identify which constraint would most constrain the architecture choice.

Sketch from memory: Without looking, draw the five-section blueprint table. Fill in the support assistant example from memory.

Bridge. With the blueprint written, the next decision is architecture. RAG, agent, or something simpler? The foundation depends on which path you choose. → 03-architecture-choices.md