00. GenAI for the SDLC — First-principles overview¶
You spent twenty-two modules learning to build AI systems. This module turns the lens around: how do you use GenAI to build software faster — without quietly trading away quality, security, and the ability to know if it even helped?
A 200-engineer SaaS company buys GitHub Copilot for everyone. Sixty days later the VP of Engineering walks into the leadership review with one slide: "Copilot acceptance rate is 31%, developers love it, NPS is +48." The CFO asks the only question that matters: "Did we ship more, and did it break less?" Silence. Nobody set a baseline. Pull-request throughput looks up 18%, but so does the rate of changes reverted within two weeks. Incident count is flat. The team that adopted hardest also has the most rework. The honest answer is: we have no idea whether this helped, because we measured the tool instead of the outcome.
That gap is the whole module. AI coding tools are real leverage — the 2025 DORA report found, for the first time, that AI adoption correlates with higher delivery throughput. But the same report found AI adoption still correlates with lower delivery stability, and a controlled trial by METR found experienced developers on their own large repos were actually 19% slower with early-2025 AI tools while believing they were 20% faster. Both things are true at once. AI does not fix an engineering org; it amplifies whatever is already there. A team with strong tests, small batches, and tight review gets faster and stays stable. A team without those gets faster at producing rework.
So the dominant pressure across this module is not "how do I use Copilot." It is: how do you capture real leverage across the software development lifecycle while keeping a measurement loop honest enough to tell leverage from illusion. Every chapter is one place in the SDLC where AI helps, one specific way it silently costs you, and one way to instrument the boundary so the org can see the truth instead of the vanity metric.
The reason this is hard is structural. The acceptance rate, the lines generated, the "developers love it" survey — these are easy to collect and they all go up. The thing you actually care about — durable, secure, maintainable software delivered at a sustainable pace — is slow to measure and easy to fake. The first instinct of every org is to declare victory on the easy metric. This module trains the opposite reflex: treat every AI win as a hypothesis, find the subsystem that quietly absorbs the new cost, and instrument it before you celebrate.
We thread one running example through all nine files: Meridian, a 200-engineer B2B SaaS company rolling out AI dev tooling. We set Meridian's baseline in this overview, pick where to apply AI in each chapter, and by the end we can say — with numbers, not vibes — whether throughput and quality actually moved.
The recurring pressures and concepts¶
These names appear in every chapter. They are the module's shorthand for the forces that decide whether an AI-for-SDLC rollout is leverage or theater.
| Pressure / concept | Meaning |
|---|---|
| the leverage-rework tradeoff | AI produces code faster, but accepted code that needs reverting or rewriting later is negative leverage; net = output minus rework. |
| the review tax | every line AI writes is a line a human must review; if generation outpaces review capacity, the bottleneck moves to review, not coding. |
| the vanity metric | a number that always goes up and proves nothing — acceptance rate, lines generated, "devs love it" — easy to collect, easy to fake. |
| the guardrail metric | a paired metric that catches the cost of optimizing the headline number — pair throughput with change-fail rate, speed with rework. |
| the source of truth | the human-owned artifact (a spec, a test, a schema) that AI generates toward but never gets to silently redefine. |
| the grounding gap | the distance between what AI asserts and what is actually true in your codebase, telemetry, or policy — the root of confident wrong answers. |
| the amplifier rule | AI multiplies the practices already in place; strong tests and small batches get amplified, so do their absence. |
| the blast radius | the worst thing a wrong AI suggestion can cause — a typo, a leaked secret, a license-contaminated file, a bad migration on prod. |
| the honest baseline | the before-state measured before rollout, so after-state change is attributable instead of imagined. |
Top resources¶
- DORA — 2025 Accelerate State of DevOps Report — https://dora.dev/research/2025/dora-report/ (AI as amplifier; throughput up, stability still at risk)
- DORA — 2024 Accelerate State of DevOps Report — https://dora.dev/research/2024/dora-report/ (the −1.5% throughput / −7.2% stability baseline)
- METR — Impact of Early-2025 AI on Experienced OS Developer Productivity — https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ (the 19%-slower RCT)
- SPACE framework — ACM Queue — https://queue.acm.org/detail.cfm?id=3454124 (multidimensional productivity)
- GitClear — AI Code Quality 2025 — https://www.gitclear.com/ai_assistant_code_quality_2025_research (code clone / churn growth)
- GitHub — Copilot features & coding agent docs — https://docs.github.com/en/copilot
- Anthropic — Claude Code docs — https://docs.anthropic.com/en/docs/claude-code
- Microsoft — DevEx / developer productivity research — https://developer.microsoft.com/en-us/developer-experience
What's coming¶
- 01-coding-assistants-in-the-loop.md — where assistants genuinely help vs where they shift the bottleneck to review. The felt failure: velocity feels up while rework rises.
- 02-spec-to-code-and-scaffolding.md — generating scaffolds, migrations, boilerplate, and IaC while keeping a human spec as the source of truth.
- 03-ai-code-review-and-quality-gates.md — AI reviewers: what they catch, what they miss, false-positive fatigue, and how to gate in CI without eroding trust.
- 04-test-and-doc-generation.md — generated tests and the coverage trap, generated docs and onboarding knowledge; the danger of tests that pass but assert nothing.
- 05-ops-and-incident-copilots.md — log/trace summarization, runbook copilots, on-call assist, and why grounding in real telemetry is the whole game.
- 06-measuring-developer-productivity.md — DORA + SPACE, guardrail metrics, why "lines accepted" is vanity, and how to run an honest before/after.
- 07-governance-ip-and-security.md — IP/license contamination, secret leakage, data boundaries, model-in-the-loop policy, and supply-chain risk.
- 08-boundary-tradeoff-review.md — contested evidence, hype vs reality, and what to revisit as tools and studies evolve.
Memory map¶
| # | File | Pressure family | Source of truth | Guardrail metric | Recurs later as |
|---|---|---|---|---|---|
| 01 | coding-assistants | leverage vs review tax | the diff a human approves | rework / revert rate | every chapter's "who reviews it" |
| 02 | spec-to-code | ambiguity vs drift | the human spec | spec-to-code conformance | tests assert the spec (04) |
| 03 | ai-code-review | catch rate vs false-positive fatigue | the team's review standards | comment-action rate | gate design (06), security gate (07) |
| 04 | test/doc-gen | coverage vs assertion strength | the behavior under test | mutation score | grounding for review (03) |
| 05 | ops copilots | speed vs grounding | real telemetry | grounded-citation rate | the grounding gap (08) |
| 06 | measuring productivity | signal vs vanity | delivered outcomes | change-fail + rework | the honest baseline everywhere |
| 07 | governance/IP/security | velocity vs liability | license + data boundary | secret/license incidents | blast radius (01, 02) |
| 08 | boundary review | hype vs evidence | the controlled comparison | — | synthesis into the capstone |
Three traversal paths use this map. Prerequisite path — read 00 → 08 in order; each chapter measures what the previous one shipped. Failure path — when a rollout "isn't working," find which subsystem is absorbing the cost: review capacity (01, 03), spec drift (02), hollow tests (04), ungrounded answers (05), unmeasured outcomes (06), or legal/security exposure (07). Synthesis path — combine any AI win with its guardrail metric to decide whether to expand, hold, or roll back.
Bridge. Before we measure anything, we have to start where the leverage and the pain are most felt — the inner loop, where a developer and an assistant write code together all day. That is where velocity feels obviously up. It is also where rework hides. The next file opens that loop, shows exactly where assistants earn their keep and where they shift the bottleneck onto review, and sets Meridian's baseline so every later chapter can measure against it. → 01-coding-assistants-in-the-loop.md