Skip to content

00. AI Product Requirements — First-Principles Overview

Before you pick a model, wire a tool, or write a prompt — you need to know what "working" means. This module teaches you to define it.


Six months ago a fintech team at a mid-size Indian payments company received a mandate from their VP of Product: "add an AI assistant to our internal support workflow." The team was sharp. They picked a strong model, built a conversational interface, connected it to their ticket database, and shipped within eight weeks. The assistant could summarise tickets, suggest resolutions, and draft replies. Demo day was a success.

Two weeks after launch the incident channel lit up. The assistant had confidently told three support agents that a specific merchant's payout was "already processed" when it was stuck in a compliance hold. The agents trusted it. They closed the tickets. The merchant escalated to the banking partner. The compliance team discovered the AI had hallucinated a status field that didn't exist in the schema it was reading. Total blast radius: one merchant relationship damaged, ₹12 lakh in delayed payouts, a regulatory conversation nobody wanted to have, and a feature flag flipped to off within four hours of discovery.

The root cause was not the model. The model did what language models do — it generated plausible text from partial context. The root cause was that nobody had written down what "correct" meant for this feature before engineering started. Nobody asked: what is the worst thing that happens when the assistant is wrong? Nobody defined: which fields must come from a deterministic database lookup and which can be model-generated? Nobody specified: what accuracy on status-field extraction is acceptable before launch? Nobody measured: how will we know this feature is succeeding versus slowly poisoning agent trust? The team had skipped the cheapest failure-finding step in the entire stack — requirements — and paid for it in production.

Requirements are the cheapest place to find failures. A requirements conversation costs hours. A production incident costs days, money, and trust. Every decision you make later — model selection, prompt design, tool architecture, eval strategy, rollout plan — flows from constraints you set (or failed to set) at the requirements layer. This module exists because most AI features fail not from bad engineering but from undefined success. You cannot evaluate what you have not specified. You cannot monitor what you have not named. You cannot roll back what you never scoped.

Why do AI features need a different requirements discipline than traditional software? Because traditional features are deterministic. A payment API either transfers the correct amount or throws an error — the failure modes are enumerable and testable. An AI feature is probabilistic. It can be wrong in ways nobody anticipated, confident in ways that erode trust silently, and correct 95% of the time while catastrophically wrong the other 5%. Traditional requirements ask "what should the system do?" AI requirements must also ask: "what should the system never do? how wrong is acceptable? how will we detect wrongness in production? what is the human fallback when the model fails?" These questions have no equivalent in a standard PRD. They require a new vocabulary — the eight concepts below — and a new process for eliciting answers before engineering starts.

Consider what a thirty-minute requirements session would have surfaced for that fintech team. The user job was not "chat with an AI" — it was "confirm the current status of a merchant payout so I can update the merchant accurately." That single reframing changes everything downstream. It tells you the failure cost is high (wrong status leads to wrong merchant communication leads to regulatory risk). It tells you the quality bar must be 100% on status fields — no tolerance for hallucination on structured data. It tells you the evidence need is a real-time database lookup, not a model inference. It tells you that payout status belongs behind a deterministic boundary — the model can summarise context and draft a reply, but the status field must come from a SQL query, not from generation.

It tells you the success signal is "agents trust the assistant enough to use it daily AND zero status-field errors in production." It tells you the risk class is financial-to-legal, which demands logging, audit trails, and a human-in-the-loop for edge cases. All of this, from one conversation that never happened.

This module walks the full arc from a vague product ask to a concrete architecture brief that engineering can build against. We use one scenario throughout — that fintech support assistant — and show how each concept would have changed the outcome if applied before the first line of code. By the end you will have a repeatable process: receive a product request, decompose it into user jobs, classify the failure costs, set a quality bar, identify evidence needs, draw deterministic boundaries, define success signals, assign risk classes, and produce an architecture brief that the next module (agentic system design) can consume directly.

The output of this module is not code. It is a document — the architecture brief — that makes every subsequent engineering decision cheaper, faster, and more defensible. When you finish file 07, you will have a single artifact you can hand to an engineering team with the words: "build to these constraints." That handoff is what separates teams that ship AI features that stick from teams that ship demos that get turned off.


The recurring pressures and concepts

These names appear in every chapter of this module. They form the vocabulary for turning "add AI" into something an engineer can build, test, and monitor. Learn them once here; every topic file will pressure-test them from a different angle. When you sit in a product review and someone says "we need AI here," these eight concepts are the questions you ask before anyone opens an IDE.

Name What it is
the user job the specific task the user is trying to accomplish — not "use the AI feature" but "confirm whether a payout has cleared"
the failure cost what happens when the AI gives a wrong, harmful, or slow answer — measured in user time lost, money lost, trust lost, or legal exposure
the quality bar minimum acceptable accuracy, latency, and safety for launch — the line below which the feature is worse than the status quo
the evidence need what data, context, or system access the model must have to answer correctly — and what happens when that evidence is stale or missing
the deterministic boundary the part of the workflow that should NOT be model-driven — lookups, calculations, compliance checks, anything where "creative" is dangerous
the success signal the measurable product outcome that proves the feature works — not "users clicked it" but "ticket resolution time dropped 20% with no accuracy regression"
the risk class severity classification of failure: cosmetic → financial → safety → legal — each class demands different safeguards, approval gates, and monitoring
the architecture brief the constraints document that flows from requirements to engineering — model choice, tool access, eval criteria, rollout gates, kill conditions

Notice the ordering is not arbitrary. You start with the user job because everything else depends on knowing what task you're augmenting. You end with the architecture brief because it is the synthesis — the single document that carries every upstream decision into the engineering phase. Skip any concept in the middle and the brief will have a hole. That hole will become an incident.

Each concept also maps to a specific question you can ask in a product review:

  • User job → "What task does the user complete faster/better with this feature than without it?"
  • Failure cost → "If the AI is wrong here, what is the worst thing that happens to the user, the business, or a third party?"
  • Quality bar → "What accuracy/latency would make this feature worse than the current workflow?"
  • Evidence need → "What data must the system access to answer correctly, and is that data available in real time?"
  • Deterministic boundary → "Which parts of this answer must be factually correct with zero tolerance for generation?"
  • Success signal → "Six weeks after launch, what metric tells us this feature is working?"
  • Risk class → "If this fails, is the damage cosmetic, financial, safety-critical, or legally actionable?"
  • Architecture brief → "What document do we hand engineering so they can build without guessing at constraints?"

Memory map

How the chapters chain. The prerequisite column tells you what to read first. The "recurs later as" column shows where each concept resurfaces when you move into agentic system design and beyond. This table is your navigation aid — when a later module references "the quality bar" or "the architecture brief," trace back here to find where that concept was first defined and pressure-tested.

Concept Prerequisite Pressure family Recurs later as Layer touched
Problem framing / user job (01) this overview specificity vs ambiguity tool selection, prompt scope product definition
Failure cost classification (02) user job defined severity vs effort blast radius, approval gates risk management
Quality bar setting (03) failure costs known precision vs recall vs latency eval gates, launch criteria measurement
Evidence mapping (04) quality bar set data availability vs freshness retrieval design, context window data architecture
Deterministic boundaries (05) evidence mapped correctness vs flexibility tool schemas, hard-coded paths system design
Success signals (06) quality bar + user job attribution vs proxy metrics observability, A/B tests analytics
Risk class assignment (07) failure costs + deterministic boundaries safeguard depth vs speed kill switch thresholds, compliance governance
Architecture brief (08) all concepts composed completeness vs over-specification design review, sprint planning handoff document

Three ways to use this map:

Forward path — read top to bottom, each chapter builds on the last. This is the recommended first pass.

Incident path — start from a production failure, trace backward through risk class and failure cost to find which requirement was missing or under-specified. Use this when running a post-mortem on an AI feature that broke.

Checklist path — before any AI feature ships, walk each row and confirm you have a written answer for every concept. Any blank cell is a gap that production will eventually expose.

The fintech team's incident maps cleanly onto this table. They had no explicit user job (row 1 was blank). Because the user job was vague, the failure cost was never classified (row 2 blank). Because failure cost was unclassified, no quality bar was set for status fields (row 3 blank). Because no quality bar existed, nobody mapped which evidence the system needed for correctness (row 4 blank). The cascade continued all the way to the architecture brief — which didn't exist. Every row left empty became a production surprise. The memory map is also a post-mortem template: for any AI incident, find the first blank row and you have found the root cause.


Top resources

These are not "further reading." They are working references you will reach for during requirements conversations. The first two cover the product discipline that predates AI but applies directly. The rest address AI-specific risk, failure taxonomy, and governance.

  • Marty Cagan — Inspired / Empowered — the non-AI version of product discovery; translates directly once you add model uncertainty to the validation loop
  • Google PAIR — People + AI Guidebook — https://pair.withgoogle.com/guidebook — design patterns for human-AI interaction with emphasis on failure modes and user mental models
  • Anthropic — Responsible scaling policy — https://www.anthropic.com/research/responsible-scaling-policy — how one lab classifies risk at the model level; useful as a framework for feature-level risk
  • NIST AI Risk Management Framework — https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence — governance structure for AI risk in regulated environments
  • Shreya Shankar — Rethinking LLM-based applications — https://arxiv.org/abs/2404.10857 — practical failure taxonomy from production ML systems; maps onto our failure cost and quality bar concepts
  • Hamilton Helmer — 7 Powers — strategy framework for deciding which AI features create durable value versus demos that decay after the novelty wears off

If you read only one resource before starting, make it the Google PAIR guidebook — it grounds the abstract concepts in this module with concrete interaction patterns and failure examples from shipped products.


What's coming

The module is structured as a pipeline. Each file takes one concept from the recurring-pressures table, defines it precisely, shows how to apply it to the fintech scenario, and produces a concrete artifact (a filled template, a classification table, a measurement spec). The artifacts accumulate. By file 07 you will have assembled them into a complete architecture brief — the document that crosses the boundary from product to engineering.

The ordering mirrors the order you would work through in a real requirements session. You start broad (what is the user actually trying to do?) and narrow progressively (what risk class does this fall into? what constraints flow to the engineering team?). Skipping ahead is possible but expensive — each file assumes you have the output of the previous one.

If you are in a hurry, read files 01 and 07 — the first to understand user jobs, the last to see what a complete architecture brief looks like — then fill in the middle as your feature demands.

  1. 01-problem-framing.md — Converting a vague "add AI" request into concrete user jobs with named actors, triggers, and outcomes. The most common failure: defining the job too broadly.
  2. 02-ai-fit-decision.md — When the model earns its cost and when it doesn't. A four-question decision tree that routes each task to model, rule engine, or deterministic lookup.
  3. 03-success-metrics.md — Separating product outcomes from model scores. Why 92% BLEU can coexist with angry users, and how to measure what actually matters.
  4. 04-acceptance-tests.md — Defining "done" before choosing architecture. Concrete pass/fail criteria that gate deployment.
  5. 05-risk-boundaries.md — What the system must never do, even when the model wants to. Risk class determines architectural constraints — not the other way around.
  6. 06-requirements-to-architecture-brief.md — The handoff that prevents rework. Packaging all requirements into the single document engineering builds against.
  7. 07-honest-admission.md — What requirements still cannot predict about AI products. The five gaps that manifest in every shipped AI feature.

Bridge. The fintech team's first mistake was not picking the wrong model — it was never asking "what specific task is the user trying to accomplish with this assistant?" Until you decompose a product request into named user jobs, every downstream decision (what context to retrieve, what accuracy to demand, what to hard-code) floats without an anchor.

Problem framing is where requirements begin. It is also where most teams skip ahead — they hear "AI assistant" and jump to model selection, prompt engineering, or tool wiring. File 01 teaches you to stay in the problem space longer than feels comfortable, because the thirty minutes you spend here save weeks of rework after launch.

01-problem-framing.md