06. Requirements to architecture brief — The handoff that prevents rework¶

~18 min read. Engineering built RAG because the brief said "AI search." The actual requirement was "answer policy questions with citations." RAG was one of three valid approaches — and nobody found that out until six weeks of infrastructure were already sunk.

Built on the full requirements chain in 00-first-principles.md. This chapter packages user jobs, AI-fit routing, success metrics, acceptance tests, and risk boundaries into a single document that constrains engineering without choosing their tools. The architecture brief is where requirements stop being your problem and start being engineering's problem — but only if the brief specifies the right things.

What user jobs, AI-fit, metrics, tests, and risk boundaries built — and what engineering still needs¶

Five chapters produced five artifacts. User jobs (ch01) told us what people actually need done. AI-fit routing (ch02) told us which of those jobs belong to a model and which belong to deterministic code. Success metrics (ch03) told us what "good" looks like in numbers. Acceptance tests (ch04) gave us a binary pass/fail gate. Risk boundaries (ch05) told us where the system must never go.

That is everything the product side owns. But engineering cannot start building from five separate documents. They need one page that draws the box: here is the space of valid solutions, here are the walls you cannot cross, here is what "done" means. The architecture brief is that page.

Without it, engineering fills ambiguity with assumptions. And assumptions compound. The brief said "AI search." Engineering assumed vector DB plus embeddings. Six weeks later the PM sees the demo and says: "Wait, users need cited sources with page numbers. This doesn't do that." The RAG pipeline they built retrieves chunks but strips citation metadata. Rework: six weeks.

1) What this file solves¶

A team shipped a vector-search pipeline because the brief said "AI-powered search for our internal wiki." The requirement was actually "answer employee policy questions with cited source documents, under three seconds." That requirement permits at least three architectures: RAG with citation tracking, fine-tuned model on policy corpus, or hybrid retrieval with extractive QA. The brief chose the architecture by accident because it named a solution instead of a constraint.

This chapter teaches you to write a brief that constrains the solution space without choosing the solution — so engineering picks the architecture that actually satisfies the requirements, not the one that matches a buzzword.

2) The rework that cost six weeks because the brief was ambiguous¶

The fintech wiki assistant. Timeline:

Week 1-2: Engineering reads the brief. "Build an AI-powered search for the internal wiki. Must be fast and accurate." They pick a standard RAG stack — chunking, embeddings, vector DB, retrieval, generation.

Week 3-4: They build the chunking pipeline. 200 wiki pages processed. Embeddings stored. Retrieval works. Generation produces fluent answers.

Week 5: PM reviews the demo. "Where are the citations? Users need to see which document and which section the answer comes from." The chunking pipeline stripped document boundaries. Metadata is gone. The generation step has no way to attribute.

Week 6: Engineering retrofits citation tracking. But the chunk boundaries don't align with document sections. They need to re-architect the ingestion pipeline.

Total rework: six weeks of a three-person team. Cost: ~$180,000 in fully-loaded engineer time. Root cause: the brief said "AI search" instead of "answer with cited sources."

Teacher voice. Not a communication problem. A constraint-specification problem. The PM said "AI search" and engineering heard "vector DB + embeddings." The actual constraint was "answer with cited sources under 3 seconds" — which could be RAG, fine-tuned model, or hybrid. The brief's job is to state the constraint, not name the approach.

3) A one-page brief that engineering actually reads vs a ten-page PRD they don't¶

Two documents. Same project.

The ten-page PRD:

Section 1: Executive Summary (1.5 pages)
Section 2: Market Context (1 page)
Section 3: User Personas (2 pages)
Section 4: Feature Requirements (2 pages)
Section 5: Success Metrics (1 page)
Section 6: Timeline and Milestones (1 page)
Section 7: Risks and Mitigations (1 page)
Section 8: Appendix (0.5 pages)

Engineering skims it, picks out the feature list, and starts building from those bullet points. The constraints buried in section 5 and section 7 never make it into the architecture.

The one-page brief:

Eight sections. Each one is 2-4 lines. Every line is a constraint or a boundary. Engineering reads the whole thing in three minutes because every sentence matters to their decisions.

Which one prevents rework? The brief. Because it is short enough to read completely and dense enough that skipping any section means missing a constraint.

Mini-FAQ. "Doesn't the PRD have value?" Yes — for stakeholder alignment, business context, timeline negotiation. But it is not the document that engineering uses to make architecture decisions. The brief is.

4) Rule: the brief constrains the solution space without choosing the solution¶

Good brief: "Answers must cite source document and section. p95 latency under 3 seconds. 500 queries/day scaling to 5000/day in 6 months."

Bad brief: "Use RAG with a vector database. Chunk documents into 512-token segments. Use OpenAI embeddings."

The first tells engineering the box they must stay inside. The second tells them what to build. When you tell engineering what to build, two things happen: (1) they resent it because you are doing their job, and (2) they cannot pivot when the chosen approach hits a wall — because the brief locked them in.

The architecture brief is a constraint document, not a solution document. It answers "what must be true about the system?" not "how should the system work?"

Teacher voice. Every line in the brief should be testable as a constraint. "Must cite sources" is testable. "Use RAG" is not a constraint — it is an implementation choice masquerading as one. If you cannot write an acceptance test for a line in the brief, that line does not belong.

5) Anatomy of the architecture brief¶

Eight sections. Each one traces back to a specific requirements artifact:

┌─────────────────────────────────────────────────────────────────────┐
│                     ARCHITECTURE BRIEF                               │
│                                                                     │
│  ┌──────────────────┐                                               │
│  │ User Jobs        │◄── from ch01: what people need done           │
│  │ Summary          │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ AI-Fit Routing   │◄── from ch02: which jobs are model-driven     │
│  │ Decisions        │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Success Metrics  │◄── from ch03: what "good" looks like          │
│  │ & Thresholds     │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Acceptance Tests │◄── from ch04: binary pass/fail gate           │
│  │                  │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Risk Class &     │◄── from ch05: where system must never go      │
│  │ Constraints      │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Latency / Cost / │◄── operational envelope                       │
│  │ Scale Envelope   │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Data Access &    │◄── what the system can read, privacy rules    │
│  │ Privacy          │                                               │
│  └────────┬─────────┘                                               │
│           │                                                         │
│  ┌────────▼─────────┐                                               │
│  │ Out of Scope     │◄── explicit boundaries                        │
│  │                  │                                               │
│  └──────────────────┘                                               │
└─────────────────────────────────────────────────────────────────────┘

Each section is 2-5 lines. The whole brief fits on one page. Engineering's decision tree starts at the top and each section eliminates solution candidates.

6) Writing the wiki-assistant brief — the actual document¶

Here is the brief for the fintech wiki assistant. Every line traces to a requirements artifact from chapters 01-05.

Wiki Assistant — Architecture Brief v1¶

User jobs: Answer internal policy questions with source citations. 5 job types identified (see appendix). - J1: "What is the policy on X?" (lookup + summarize) - J2: "Does policy X apply to situation Y?" (reasoning + cite) - J3: "What changed in policy X since date?" (diff + cite) - J4: "Who owns policy X?" (deterministic lookup) - J5: "Summarize onboarding policies for role Y" (multi-doc synthesis)

AI-fit tasks: 3/5 jobs model-driven (J1, J2, J5), 2/5 deterministic lookup (J3 date-range query, J4 directory lookup). See routing matrix.

Quality bar: 90% answer correctness on golden set, source cited in 95% of responses, < 3s p95 latency.

Risk constraints: - Class 3 (financial advice): human review gate required before delivery - Class 4 (PII): hard block via auth check, no model involvement - All answers must cite source document and version - No answer may contradict the cited source (faithfulness constraint)

Scale envelope: 500 queries/day initially, 5000/day in 6 months. Budget: $2000/month compute.

Data access: 200 wiki pages (Confluence), 15 policy PDFs, employee directory (read-only, no PII in responses).

Integration points: Confluence API (read), Slack bot interface (input/output), SSO for auth, audit log for compliance.

Out of scope: Writing/editing wiki pages, external customer queries, multi-turn conversation (v2), real-time policy alerts.

Acceptance gate: Pass all 50 golden-set tests, zero Class 4 violations in 1000-query stress test, p95 latency < 3s under 50 concurrent users.

This is thirty lines. An engineer reads it in three minutes. After reading, they know: - What the system does (answer questions with citations) - What quality means (90% correct, 95% cited, <3s) - What is forbidden (PII exposure, uncited answers, financial advice without review) - How big it needs to be (500→5000 queries/day, $2000/month) - What it cannot touch (writes, external users, multi-turn)

They do not know whether to use RAG, fine-tuning, or hybrid. That is the point.

Teacher voice. The brief's job is to make three architectures obviously wrong and leave two or three obviously viable. Engineering picks among the viable ones based on their expertise. If the brief leaves ten architectures viable, it is too loose. If it leaves one, it is too tight — you accidentally chose the implementation.

7) Brief vs PRD vs technical spec — three documents, three audiences¶

Dimension	Architecture Brief	PRD	Technical Spec
Audience	Engineering leads making arch decisions	Stakeholders, PM, design	Engineers implementing
Length	1 page (~30 lines)	5-15 pages	10-50 pages
Contains	Constraints, boundaries, envelopes	Business context, user stories, timelines	APIs, schemas, algorithms, configs
Answers	"What must be true?"	"Why are we building this?"	"How exactly does it work?"
Written by	PM + requirements engineer	PM	Engineering lead
Read by	Arch decision maker	Everyone	Implementing engineers
Timing	After requirements, before architecture	Before requirements	After architecture
Locks in	The problem space	The business case	The solution
Changes when	User needs or constraints change	Strategy changes	Implementation hits a wall

The brief sits between PRD and tech spec. It is the translation layer: business intent (PRD) → engineering constraints (brief) → implementation choices (tech spec).

Common failure: teams skip the brief. They go straight from PRD to tech spec. The tech spec author fills the constraint gap with assumptions. Those assumptions become architecture. Nobody questions them because they look like deliberate decisions.

PRD ("why")              Brief ("what must be true")     Tech Spec ("how")
─────────────           ────────────────────────────    ──────────────────
stakeholder need   →    constraint envelope        →    implementation
business context   →    quality bar                →    algorithms
user stories       →    risk boundaries            →    schemas
timeline           →    scale envelope             →    APIs
                        acceptance gate            →    test harness

8) How the brief shapes architecture without dictating it¶

The wiki assistant brief constrains. Watch how constraints eliminate options without choosing one:

Constraint: "< 3s p95 latency" - Eliminates: any approach requiring multiple sequential LLM calls (chain-of-thought with 4+ steps would blow the budget) - Permits: single-call RAG, pre-computed summaries, hybrid retrieval, fine-tuned model

Constraint: "source cited in 95% of responses" - Eliminates: pure generative approaches without retrieval (fine-tuned model alone cannot guarantee citations) - Permits: RAG with citation tracking, extractive QA, retrieval + generation with source attribution

Constraint: "$2000/month compute at 5000 queries/day" - That is $0.40 per day per query, or ~$0.013 per query - Eliminates: GPT-4 at $0.03/1K input tokens for long contexts (would cost ~$4500/month at scale) - Permits: smaller models, cached responses, hybrid with cheap retrieval + expensive generation only when needed

Constraint: "Class 3 human review gate" - Eliminates: fully autonomous responses for financial advice queries - Permits: async review queue, confidence-threshold routing, flag-and-hold patterns

Each constraint removes solution candidates. The intersection of all constraints leaves a small viable space. Engineering explores that space — not the entire universe of possible AI architectures.

Mini-FAQ. "What if the constraints are contradictory?" Then the requirements are wrong, and you find that out now — before engineering builds anything. A brief that cannot be satisfied is a gift: it forces the PM to relax a constraint or change the scope. That conversation at week 1 costs nothing. At week 8 it costs a rewrite.

9) Ambiguous brief vs constrained brief — what architecture each produces¶

AMBIGUOUS BRIEF                           CONSTRAINED BRIEF
────────────────                          ─────────────────

"Build AI search for                      "Answer policy questions with
 the internal wiki.                        cited sources. 90% correct.
 Must be fast and accurate."               < 3s p95. $2000/mo at 5K/day."

        │                                          │
        ▼                                          ▼
┌─────────────────────┐                   ┌─────────────────────────┐
│ Engineer assumes:   │                   │ Engineer calculates:    │
│ - "AI search" =     │                   │ - citation requirement  │
│    vector search    │                   │   → needs retrieval     │
│ - "fast" = < 1s?   │                   │ - $0.013/query budget   │
│    < 5s? unclear    │                   │   → small model or cache│
│ - "accurate" = ??  │                   │ - 3s p95 → max 2 LLM   │
│                     │                   │   calls per request     │
└────────┬────────────┘                   └────────┬────────────────┘
         │                                         │
         ▼                                         ▼
┌─────────────────────┐                   ┌─────────────────────────┐
│ Builds:             │                   │ Evaluates:              │
│ - OpenAI embeddings │                   │ - RAG + citation track  │
│ - Pinecone          │                   │ - Fine-tuned + retrieval│
│ - GPT-4 generation  │                   │ - Hybrid: cache common  │
│ - No citations      │                   │   + generate rare       │
│ - No cost tracking  │                   │ Picks cheapest that     │
│ - No latency budget │                   │ passes acceptance gate  │
└────────┬────────────┘                   └────────┬────────────────┘
         │                                         │
         ▼                                         ▼
┌─────────────────────┐                   ┌─────────────────────────┐
│ Week 5: PM review   │                   │ Week 5: acceptance test │
│ "Where are the      │                   │ 47/50 golden-set pass   │
│  citations?"        │                   │ 0 Class 4 violations    │
│ → 6 weeks rework    │                   │ p95 = 2.4s             │
└─────────────────────┘                   │ → ship to staging       │
                                          └─────────────────────────┘

The constrained brief does not mention RAG, vector databases, or any specific technology. It states what must be true. Engineering figures out how to make it true.

10) Signals that the brief is good enough vs needs iteration¶

Good-enough signals: - An engineer can read it in under 5 minutes and tell you which architectures are eliminated - Every line is testable (you can write a pass/fail check for it) - At least two viable architectures remain after applying all constraints - No line names a specific technology, vendor, or implementation pattern - The "out of scope" section exists and engineers agree it matches their understanding - Cost and scale numbers are concrete, not "scalable" or "cost-effective"

Needs-iteration signals: - Engineer asks "what do you mean by accurate?" after reading it - Two engineers read it and propose radically different interpretations of a constraint - The brief names a solution ("use RAG," "deploy on Lambda," "use GPT-4") - No latency, cost, or scale numbers appear - The acceptance gate is not specific enough to automate - Risk constraints are missing or vague ("must be safe" instead of "Class 3: human review for financial advice")

The acid test: hand the brief to two senior engineers independently. If they come back with architectures that satisfy all constraints but differ in approach — the brief is working. If they come back with the same architecture — the brief might be too prescriptive. If they come back with contradictory interpretations of the constraints — the brief is ambiguous.

Teacher voice. A brief is a specification, not a suggestion. Treat ambiguity in a brief the same way you treat ambiguity in an API contract: it will be interpreted differently by different consumers, and every divergent interpretation is a future bug.

11) Where the brief cannot help — unknown unknowns and emergent requirements¶

The brief assumes you know your constraints before building. Three categories of requirements resist this:

Emergent requirements — Users discover new jobs only after seeing the system. The wiki assistant ships. Users start asking "compare policy X to policy Y" — a job nobody identified in ch01. The brief has no constraint for comparison quality because nobody asked for comparison.

Shifting baselines — The wiki grows from 200 to 800 pages. Retrieval quality degrades. The "90% correctness" constraint was calibrated on 200 pages. At 800 pages, the same architecture drops to 72%. The constraint did not change; the difficulty did.

Adversarial inputs — Users discover they can get the system to reveal source documents they should not access by asking carefully crafted questions. The risk boundary (ch05) said "no PII in responses" but did not anticipate prompt injection as an access-control bypass.

The brief cannot anticipate these. What it can do: include an explicit "review triggers" section that tells engineering when to escalate back to the PM.

Review triggers (re-open the brief when):
- A new job type appears in usage logs that is not in the routing matrix
- Correctness on golden set drops below 85% for two consecutive weeks
- A risk-class violation is detected that was not in the original risk inventory
- Scale exceeds 2x the stated envelope before the planned timeline

This is not a failure of the brief. It is the brief acknowledging its own shelf life.

12) Wrong assumption: "engineering will figure out the constraints themselves"¶

The most common failure mode is not a bad brief. It is no brief at all.

PM ships a PRD. Engineering reads the feature list. Engineering makes architecture decisions. Those decisions embed assumptions about quality bars, latency budgets, cost limits, and risk tolerances that the PM never stated — and may not agree with.

Example: Engineering assumes "fast enough" means < 500ms because that is what they would want. PM meant < 5s because the alternative is a 2-minute manual search. Engineering over-optimizes for latency, blows the cost budget, and ships a system that is fast but too expensive to scale.

Example: Engineering assumes correctness means "the answer is relevant." PM meant "the answer is correct and attributed to a specific source with page number." Engineering ships a system with no citation capability because "correctness" was never defined in a testable way.

Example: Engineering assumes everything is in scope. They build multi-turn conversation, wiki editing, and real-time alerts because those seemed natural extensions. PM meant v1 is read-only Q&A only. Engineering spent three months building features nobody will use in the first release.

The fix is not better engineers. It is a one-page document that takes thirty minutes to write and saves months of rework.

Mini-FAQ. "But good engineers ask clarifying questions." Some do. At some companies. On some days. A brief removes the dependency on individual initiative. It makes the constraints visible to everyone — the senior who would ask and the junior who would not.

13) Pattern transfer — how the brief feeds downstream modules¶

The architecture brief is not the end of the requirements process. It is the beginning of the architecture process. Specifically:

→ Architecture decision tree (Module 01, ch01): The brief's constraints become the inputs to the first architecture decision. "< 3s p95 latency" and "$0.013/query budget" tell the architect whether a multi-agent approach is even feasible (it probably is not at that budget).

→ Eval design (Module 04, ai_product_evals): The acceptance gate in the brief becomes the first eval set. "50 golden-set tests" is the minimum. The eval module teaches how to expand that into regression suites, adversarial sets, and drift detection.

→ Risk and safety (Module 05): Risk constraints from the brief flow directly into safety guardrails. "Class 3: human review gate" becomes an approval-gate pattern (Module 01, ch13). "Class 4: hard block" becomes a pre-call filter in the agent architecture.

→ Cost and latency budgets (Module 01, ch14): The scale envelope and cost numbers from the brief become the budget constraints that architecture must satisfy. "$2000/month at 5000 queries/day" is the input to the cost model.

The brief is a contract. Downstream modules honor it. If downstream work reveals the brief's constraints are unsatisfiable, the feedback loop goes back to the PM — not around them.

Where this pattern appears in production¶

Stripe — internal "API brief" documents constrain new services before architecture review
Google — design docs start with "constraints and non-goals" that function as the brief
Notion AI — explicit latency and cost-per-query budgets before engineering chose the model
GitHub Copilot — acceptance criteria defined as completion acceptance rate and latency percentiles, not "use GPT-4"
Anthropic — Claude's system card defines behavioral constraints without prescribing implementation
Spotify — ML briefs specify offline/online split, latency SLA, and freshness requirements
Netflix — recommendation briefs state engagement metrics and latency budgets, not algorithms
Airbnb — search ranking briefs define relevance metrics and diversity constraints
Duolingo — AI tutor requirements specify pedagogical constraints (error correction rate, encouragement ratio) not model choice
Intercom — Fin chatbot brief defined resolution rate and escalation triggers, not architecture
Linear — "must work in < 2s on a 100-issue project" as the constraint
Notion — Q&A feature: "cite the specific block, not just the page" eliminated naive RAG
Slack — "must respect channel permissions" shaped the entire retrieval architecture
Figma — "must not modify user's design without explicit confirmation" — a risk constraint
Vercel — v0: "generate deployable code, not pseudocode" as quality constraint
Cursor — "< 300ms perceived latency" eliminated all multi-call approaches
Replit — "must run in user's environment" ruled out cloud-only generation
Khan Academy — Khanmigo: "never give the answer directly" as pedagogical guardrail
Mercado Libre — "results must match category taxonomy" as structural constraint
Nubank — "escalate to human on any investment advice" as risk boundary

Recall¶

What is the difference between a constraint and an implementation choice in a brief?
Name the eight sections of the architecture brief and which requirements artifact each traces to.
Why does the brief need an "out of scope" section?
What is the acid test for whether a brief is sufficiently constrained?
How do you detect that a brief is too prescriptive (names solutions instead of constraints)?
What three categories of requirements does the brief fail to capture, and what mechanism compensates?
What happens when engineering receives no brief and fills constraint gaps with assumptions?
How does the cost/scale envelope in the brief eliminate architecture candidates?

Interview Q&A¶

Q1: You have gathered all requirements for an AI feature. What document do you produce for engineering, and what does it contain?

An architecture brief — one page, eight sections: user jobs summary, AI-fit routing decisions, success metrics with thresholds, acceptance test set, risk class and constraints, latency/cost/scale envelope, data access and privacy requirements, and explicit out-of-scope boundaries. Every line is a testable constraint. No line names a technology or implementation approach.

Common wrong answer to avoid: "A PRD with user stories and acceptance criteria." The PRD serves stakeholders; the brief serves the architecture decision. They are different documents for different audiences at different times.

Q2: How do you ensure the brief does not accidentally prescribe the architecture?

Every line passes the "testable constraint" check: can you write an automated acceptance test for it? "< 3s p95 latency" is testable. "Use RAG" is not — it is a design choice. Additionally, the acid test: two engineers reading the brief independently should be able to propose different architectures that both satisfy all constraints.

Common wrong answer to avoid: "Keep it high-level and vague." Vague is not the same as non-prescriptive. "Must be fast" is vague and useless. "< 3s p95" is precise and non-prescriptive.

Q3: What signals tell you the brief needs another iteration before handing to engineering?

Two engineers read it and propose radically different interpretations of the same constraint. A line names a technology. No concrete numbers appear for latency, cost, or scale. The acceptance gate cannot be automated. An engineer asks "what do you mean by X?" for any core term.

Common wrong answer to avoid: "Engineering says it's too short." Brevity is a feature. The brief's value comes from density, not length. A longer brief is usually a worse brief.

Q4: Requirements change after the brief is written and engineering has started. What do you do?

Re-open the brief, update the changed constraint, and assess impact. If the change invalidates the architecture (e.g., latency budget halves), engineering needs to re-evaluate. If it tightens within the existing viable space (e.g., correctness threshold rises from 90% to 93%), engineering adjusts without re-architecture. The brief has "review triggers" that define when this re-opening happens proactively.

Common wrong answer to avoid: "File a change request and let the process handle it." Process without judgment is theater. The question is whether the change invalidates architecture — that requires a human assessment, not a form.

Q5: How does the brief relate to a PRD and a technical spec?

Three documents, three audiences, three questions. PRD answers "why are we building this?" for stakeholders. Brief answers "what must be true about the system?" for architecture decision-makers. Tech spec answers "how exactly does it work?" for implementing engineers. The brief sits between PRD and tech spec as the translation layer from business intent to engineering constraints.

Common wrong answer to avoid: "The brief is just the requirements section of the PRD." No. The brief has a different audience (engineers, not stakeholders), different content (constraints, not stories), and different timing (after requirements, before architecture).

Q6: An engineer says the brief's constraints are contradictory — latency and cost cannot both be satisfied. What does that mean?

It means the requirements are wrong, and you found out at week 1 instead of week 8. Either relax a constraint (accept higher latency), change the scope (remove a job type), or increase the budget. This is the brief working as intended: it surfaces impossible combinations before engineering commits to an architecture.

Common wrong answer to avoid: "Push back on engineering — they need to be creative." If the math does not work (budget per query cannot cover model costs at required latency), creativity will not change arithmetic. The PM owns the constraints; if they conflict, the PM resolves them.

Q7: What does "out of scope" in the brief actually prevent?

It prevents engineering from building features nobody asked for in this version. Without it, a thoughtful engineer sees natural extensions ("if we answer questions, we should also let users edit the wiki") and builds them — adding months of work that delays the core value. "Out of scope" is a permission to not build something, which engineers often need explicitly.

Common wrong answer to avoid: "It's just a nice-to-have section for clarity." It is a load-bearing boundary. Remove it and scope creep is guaranteed — not from PMs adding features, but from engineers extending naturally.

Design/debug exercise (10 min)¶

Step 1 — Modeled example:

Given these requirements for a customer support AI: - Users need: instant answers to shipping questions, order status, return policy - AI-fit: FAQ answers (model-driven), order status (deterministic API), returns (model + human review for exceptions) - Quality: 85% resolution without human escalation, < 5s response time - Risk: cannot process refunds autonomously, cannot access payment details

Write the brief's constraint section:

Quality bar: 85% resolution rate without escalation, < 5s p95 response.
Risk constraints:
- Refund processing: human approval required (no autonomous financial actions)
- Payment data: hard block, system cannot access or display card numbers
Scale: 2000 tickets/day, $3000/month compute budget.
Out of scope: proactive outreach, sentiment analysis, agent performance scoring.

Step 2 — Your turn:

Take a product you are building or have built. Write the eight-section architecture brief in under 30 lines. Then check: does every line pass the "testable constraint" check? Does any line accidentally name a technology?

Step 3 — Reproduce from memory:

Close this file. Write the eight section headings of the architecture brief from memory. For each heading, write one example constraint from the wiki assistant. Check against the brief in section 6.

Operational memory¶

The architecture brief exists because ambiguity in requirements becomes assumptions in architecture, and assumptions become rework. The six-week rework on the wiki assistant happened not because engineers were bad, but because "AI search" is not a constraint — it is a solution label that different people interpret differently.

The mechanism is constraint specification: state what must be true about the system (quality bars, risk boundaries, scale envelopes, acceptance gates) without stating how to achieve it. This gives engineering a bounded space to explore instead of either an empty canvas (where they guess) or a prescription (where they cannot adapt).

The brief works when two engineers read it independently and propose different architectures that both satisfy all constraints. It fails when either engineer cannot tell which architectures are eliminated. The diagnostic question to carry forward: "Can I write an acceptance test for every line in this brief?"

Remember: - The brief constrains the solution space without choosing the solution - Every line must be testable — if you cannot write a pass/fail check, remove the line - Eight sections: jobs, AI-fit, metrics, acceptance tests, risk, scale envelope, data access, out of scope - Ambiguity in the brief becomes assumptions in architecture becomes rework in production - The acid test: two engineers, same brief, different valid architectures - "Out of scope" is load-bearing — it prevents well-intentioned scope creep from engineering - Review triggers tell engineering when to re-open the brief, not when to guess

Bridge. The brief is complete. Requirements flow into architecture. But requirements engineering for AI products still has gaps — emergent behavior, shifting baselines, adversarial inputs, and requirements that change the moment users see the system. Next: what this discipline honestly cannot guarantee yet, and where operational judgment must fill the space that specification cannot reach. → 07-honest-admission.md