00. GenAI for the SDLC — First-principles overview¶

You spent twenty-two modules learning to build AI systems. This module turns the lens around: how do you use GenAI to build software faster — without quietly trading away quality, security, and the ability to know if it even helped?

A 200-engineer SaaS company buys GitHub Copilot for everyone. Sixty days later the VP of Engineering walks into the leadership review with one slide: "Copilot acceptance rate is 31%, developers love it, NPS is +48." The CFO asks the only question that matters: "Did we ship more, and did it break less?" Silence. Nobody set a baseline. Pull-request throughput looks up 18%, but so does the rate of changes reverted within two weeks. Incident count is flat. The team that adopted hardest also has the most rework. The honest answer is: we have no idea whether this helped, because we measured the tool instead of the outcome.

That gap is the whole module. AI coding tools are real leverage — the 2025 DORA report found, for the first time, that AI adoption correlates with higher delivery throughput. But the same report found AI adoption still correlates with lower delivery stability, and a controlled trial by METR found experienced developers on their own large repos were actually 19% slower with early-2025 AI tools while believing they were 20% faster. Both things are true at once. AI does not fix an engineering org; it amplifies whatever is already there. A team with strong tests, small batches, and tight review gets faster and stays stable. A team without those gets faster at producing rework.

So the dominant pressure across this module is not "how do I use Copilot." It is: how do you capture real leverage across the software development lifecycle while keeping a measurement loop honest enough to tell leverage from illusion. Every chapter is one place in the SDLC where AI helps, one specific way it silently costs you, and one way to instrument the boundary so the org can see the truth instead of the vanity metric.

The reason this is hard is structural. The acceptance rate, the lines generated, the "developers love it" survey — these are easy to collect and they all go up. The thing you actually care about — durable, secure, maintainable software delivered at a sustainable pace — is slow to measure and easy to fake. The first instinct of every org is to declare victory on the easy metric. This module trains the opposite reflex: treat every AI win as a hypothesis, find the subsystem that quietly absorbs the new cost, and instrument it before you celebrate.

We thread one running example through all nine files: Meridian, a 200-engineer B2B SaaS company rolling out AI dev tooling. We set Meridian's baseline in this overview, pick where to apply AI in each chapter, and by the end we can say — with numbers, not vibes — whether throughput and quality actually moved.

The recurring pressures and concepts¶

These names appear in every chapter. They are the module's shorthand for the forces that decide whether an AI-for-SDLC rollout is leverage or theater.

Pressure / concept	Meaning
the leverage-rework tradeoff	AI produces code faster, but accepted code that needs reverting or rewriting later is negative leverage; net = output minus rework.
the review tax	every line AI writes is a line a human must review; if generation outpaces review capacity, the bottleneck moves to review, not coding.
the vanity metric	a number that always goes up and proves nothing — acceptance rate, lines generated, "devs love it" — easy to collect, easy to fake.
the guardrail metric	a paired metric that catches the cost of optimizing the headline number — pair throughput with change-fail rate, speed with rework.
the source of truth	the human-owned artifact (a spec, a test, a schema) that AI generates toward but never gets to silently redefine.
the grounding gap	the distance between what AI asserts and what is actually true in your codebase, telemetry, or policy — the root of confident wrong answers.
the amplifier rule	AI multiplies the practices already in place; strong tests and small batches get amplified, so do their absence.
the blast radius	the worst thing a wrong AI suggestion can cause — a typo, a leaked secret, a license-contaminated file, a bad migration on prod.
the honest baseline	the before-state measured before rollout, so after-state change is attributable instead of imagined.

Top resources¶

DORA — 2025 Accelerate State of DevOps Report — https://dora.dev/research/2025/dora-report/ (AI as amplifier; throughput up, stability still at risk)
DORA — 2024 Accelerate State of DevOps Report — https://dora.dev/research/2024/dora-report/ (the −1.5% throughput / −7.2% stability baseline)
METR — Impact of Early-2025 AI on Experienced OS Developer Productivity — https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ (the 19%-slower RCT)
SPACE framework — ACM Queue — https://queue.acm.org/detail.cfm?id=3454124 (multidimensional productivity)
GitClear — AI Code Quality 2025 — https://www.gitclear.com/ai_assistant_code_quality_2025_research (code clone / churn growth)
GitHub — Copilot features & coding agent docs — https://docs.github.com/en/copilot
Anthropic — Claude Code docs — https://docs.anthropic.com/en/docs/claude-code
Microsoft — DevEx / developer productivity research — https://developer.microsoft.com/en-us/developer-experience

What's coming¶

01-coding-assistants-in-the-loop.md — where assistants genuinely help vs where they shift the bottleneck to review. The felt failure: velocity feels up while rework rises.
02-spec-to-code-and-scaffolding.md — generating scaffolds, migrations, boilerplate, and IaC while keeping a human spec as the source of truth.
03-ai-code-review-and-quality-gates.md — AI reviewers: what they catch, what they miss, false-positive fatigue, and how to gate in CI without eroding trust.
04-test-and-doc-generation.md — generated tests and the coverage trap, generated docs and onboarding knowledge; the danger of tests that pass but assert nothing.
05-ops-and-incident-copilots.md — log/trace summarization, runbook copilots, on-call assist, and why grounding in real telemetry is the whole game.
06-measuring-developer-productivity.md — DORA + SPACE, guardrail metrics, why "lines accepted" is vanity, and how to run an honest before/after.
07-governance-ip-and-security.md — IP/license contamination, secret leakage, data boundaries, model-in-the-loop policy, and supply-chain risk.
08-boundary-tradeoff-review.md — contested evidence, hype vs reality, and what to revisit as tools and studies evolve.

Memory map¶

#	File	Pressure family	Source of truth	Guardrail metric	Recurs later as
01	coding-assistants	leverage vs review tax	the diff a human approves	rework / revert rate	every chapter's "who reviews it"
02	spec-to-code	ambiguity vs drift	the human spec	spec-to-code conformance	tests assert the spec (04)
03	ai-code-review	catch rate vs false-positive fatigue	the team's review standards	comment-action rate	gate design (06), security gate (07)
04	test/doc-gen	coverage vs assertion strength	the behavior under test	mutation score	grounding for review (03)
05	ops copilots	speed vs grounding	real telemetry	grounded-citation rate	the grounding gap (08)
06	measuring productivity	signal vs vanity	delivered outcomes	change-fail + rework	the honest baseline everywhere
07	governance/IP/security	velocity vs liability	license + data boundary	secret/license incidents	blast radius (01, 02)
08	boundary review	hype vs evidence	the controlled comparison	—	synthesis into the capstone

Three traversal paths use this map. Prerequisite path — read 00 → 08 in order; each chapter measures what the previous one shipped. Failure path — when a rollout "isn't working," find which subsystem is absorbing the cost: review capacity (01, 03), spec drift (02), hollow tests (04), ungrounded answers (05), unmeasured outcomes (06), or legal/security exposure (07). Synthesis path — combine any AI win with its guardrail metric to decide whether to expand, hold, or roll back.

Bridge. Before we measure anything, we have to start where the leverage and the pain are most felt — the inner loop, where a developer and an assistant write code together all day. That is where velocity feels obviously up. It is also where rework hides. The next file opens that loop, shows exactly where assistants earn their keep and where they shift the bottleneck onto review, and sets Meridian's baseline so every later chapter can measure against it. → 01-coding-assistants-in-the-loop.md