07. Governance, IP, and security — the cost that no DORA metric shows until it's a headline¶
~19 min read. An engineer accepts a fluent 40-line function the assistant produced for a parsing routine. It works, tests pass, it ships. Three things happened that no dashboard caught: the snippet is a near-verbatim copy of GPL-licensed code, the prompt that produced it pasted a customer's API key into a third-party model, and one of its imports is a package name the model invented — which an attacker has already registered on PyPI. None of these show up in throughput or change-fail rate. They show up as a license claim, a breach disclosure, and a poisoned dependency. This file shows where AI coding creates legal and security blast radius the delivery metrics can't see, and how to put a model-in-the-loop policy and gates in front of each one.
Built on 00-first-principles.md. The forces here are the blast radius, the source of truth, the guardrail metric, and the amplifier rule. File 06 gave us metrics for "did it help." This file covers the costs those metrics never show — IP contamination, secret leakage, supply-chain risk — where the worst case isn't rework but a lawsuit or a breach, and the guardrail becomes secret/license/dependency incidents.
What we know so far and what still breaks¶
The module has measured AI's leverage and its cost in the currency of delivery: rework (file 01), drift (file 02), false-positive fatigue (file 03), hollow tests (file 04), incident minutes (file 05), and the DORA outcomes that tell leverage from theater (file 06). Every one of those costs is internal and reversible — you revert the change, fix the test, restore the service, re-measure. The org absorbs it and moves on.
What still breaks is the class of cost that is external and often irreversible. When AI writes code, it can reproduce licensed code you have no right to ship, your prompts can carry secrets and proprietary code out of your boundary, and the dependencies it suggests can be names that don't exist — which attackers register and poison. These don't show up in any DORA metric, because they aren't delivery problems; they're legal and security problems. They surface as a copyright claim, a breach notification, or a compromised build — long after the dashboard said the rollout was healthy.
This chapter answers three things: where AI coding creates blast radius that delivery metrics can't see (IP contamination, secret/data leakage, supply-chain risk), why each one's worst case is irreversible in a way rework never is, and how a model-in-the-loop policy plus deterministic gates put oversight in front of the blast radius before it ships.
What this file solves¶
An engineer accepts an AI-generated function that works and passes tests, and silently incurs three liabilities the dashboard can't see: a license-contaminated snippet, a secret pasted into a third-party model, and a hallucinated package name an attacker is squatting. This file gives you the concrete move: classify AI usage by blast radius (what's the worst thing a wrong suggestion can cause — rework, a leak, a lawsuit, a poisoned build), set a data boundary so prompts can't carry secrets or proprietary code to a model you don't control, gate code on license scanning, secret scanning, and dependency verification, and write a model-in-the-loop policy that scales oversight to the blast radius instead of treating all AI output the same.
Why governance is a different kind of cost than rework¶
Watch the same accept that file 01 watched, but follow what the dashboard can't. An engineer at Meridian, working on a log parser, accepts a 40-line function the assistant offered. It compiles, the tests pass, it merges. The DORA metrics see a normal change: throughput up a hair, change-fail unaffected. Everything looks healthy.
What the metrics can't see is layered underneath. The function is a near-verbatim reproduction of a snippet from a GPL-licensed project the model trained on — without the duplicate-detection filter, roughly 1% of Copilot suggestions match training data verbatim, and a 40-line block is exactly the size where that happens. The prompt that produced it included a stack trace the engineer pasted, and the stack trace contained a live API key — now sent to a third-party model's servers, outside Meridian's boundary. And one of the function's imports, logparse-utils, is a package name the model invented; an attacker monitoring AI hallucinations registered it on PyPI last week with a credential-stealer inside.
So the real value of governance is not "more process." It is recognizing that AI coding creates a class of cost that is external and irreversible — legal exposure, leaked data, poisoned dependencies — which delivery metrics are structurally blind to, because they aren't delivery problems. Rework you revert. A license claim, a leaked key, and a compromised build you cannot revert; the secret is exfiltrated, the code is shipped, the dependency is in the lockfile.
So how do we put oversight in front of the costs the dashboard can't see, without slowing down the costs it can?
The naive read: treat AI output like any other code, let review catch it¶
Meridian's first instinct is reasonable: AI writes code, humans review code, the existing review and CI process will catch problems. No special policy — AI output goes through the same PR gate as everything else.
The break is that the existing gate was designed for human failure modes, not AI ones. Human reviewers catch logic bugs and style issues; they do not catch that a 40-line block is a verbatim GPL reproduction (they can't have memorized every licensed snippet), they do not see that a key was pasted into a prompt (the prompt isn't in the PR), and they do not verify that every imported package actually exists and is the one intended (imports look normal). The standard gate is blind to all three because none of them look wrong in a diff.
What the human PR review catches: What it's blind to (AI-specific):
logic errors verbatim licensed-code reproduction
style violations secrets pasted into the prompt (not in PR)
obvious security smells hallucinated package in imports
missing tests data sent outside the trust boundary
The naive read fails because it assumes AI's failure modes are the same as humans', so the human-shaped gate covers them. They aren't. AI reproduces training data, carries prompt context to external servers, and invents plausible-looking names — three failure shapes a human reviewer is structurally unequipped to catch by reading a diff.
So the real cause is not "reviewers aren't careful enough." It is that AI introduces failure modes the existing gate was never designed to detect — IP reproduction, boundary-crossing data, and invented dependencies — and these are invisible in a diff, so human review cannot be the control. The blast radius is legal and security, but the detection has to be deterministic and machine-driven, because the signal isn't visible to a human reading code.
So how do we build gates aimed at AI's specific failure modes, and scale oversight to what each one can actually cost?
When a working function carries three invisible liabilities¶
Here is the smallest version of the whole problem, on one accepted suggestion.
def parse_log_entry(line):
# ... 40 lines, works, tests pass, merges clean ...
import logparse_utils # ← (3) hallucinated package — attacker squats it
# Liability 1 (IP): this block is ~verbatim from a GPL project → license contamination
# Liability 2 (secret): the prompt included a pasted stack trace with a live API key
# → key sent to a 3rd-party model, now outside the boundary
# Liability 3 (supply): logparse_utils doesn't exist as the model "knew" it; an attacker
# registered that exact name on PyPI with a credential stealer
# DORA metrics see: a normal, healthy change. None of the three is visible in the diff.
The function passes every delivery check. The three liabilities live in dimensions the diff doesn't show: where the code came from (training data), where the prompt went (external server), and whether the import is real (registry). Same working function — three irreversible costs — and the only way to catch any of them is a deterministic scan aimed at that specific failure mode.
Rule: oversight scales with blast radius, and the blast radius here is irreversible¶
The load-bearing truth of this chapter: for AI coding, oversight intensity must scale with the blast radius of a wrong suggestion, and the governance failure modes have a blast radius that is external and irreversible — a license claim, an exfiltrated secret, a poisoned dependency — so they need deterministic gates, not human review. Rework is cheap and reversible; you can let the inner loop run fast and catch rework downstream. A leaked key is exfiltrated the instant the prompt is sent; a GPL block is contaminating the moment it ships; a malicious package is in the build the moment it's installed. These can't be caught after the fact, so the control must be before the action: a data boundary before the prompt leaves, a scan before the code merges, a verification before the package installs.
Why human review can't be the governance control. The primitive is the blast radius from file 02: oversight should match what a wrong action can break. The constraint that breaks the naive approach is that AI's governance failure modes are invisible in a diff and irreversible once executed — a reviewer can't have memorized every licensed snippet, can't see the prompt that leaked the key, and can't tell an invented package name from a real one by reading it. So the detection must be deterministic (license scanner, secret scanner, dependency allowlist) and the prevention must be at the boundary (an enterprise model that doesn't train on your data, a proxy that strips secrets). The fix is to gate the specific failure mode with a machine that can see it, and to classify usage so the highest-blast-radius uses get the strongest gate.
1) The three blast radii — IP, data, and supply chain, gated where each one happens¶
The mechanism is to recognize that AI coding has three distinct governance blast radii, each invisible to the delivery gate and each needing its own deterministic control at the point it happens.
1. IP / LICENSE CONTAMINATION — at code generation
risk: model reproduces training-data code under a license you can't comply with
gate: duplicate-detection filter (suppress verbatim matches) + license scanner in CI
2. SECRET / DATA LEAKAGE — at the prompt
risk: prompt carries secrets or proprietary code to a model outside your boundary
gate: enterprise model (no training on your data, zero-retention) + secret scanning
on prompts and commits + a data boundary policy
3. SUPPLY-CHAIN / HALLUCINATED DEPENDENCY — at dependency resolution
risk: model invents a package name; attacker squats it (slopsquatting)
gate: dependency allowlist / private registry + verify-package-exists check +
lockfile pinning + SCA scanning
Each gate sits where its failure happens, not at the PR. IP is gated at generation (filter) and at merge (scanner). Data leakage is gated at the prompt (boundary + secret scan) — before the prompt leaves, because after is too late. Supply chain is gated at dependency resolution (allowlist + existence check) — before install, because the malicious package runs on install. For Meridian's parser function, gate 1 would have flagged the GPL block, gate 2 would have stripped or blocked the key before the prompt left, and gate 3 would have rejected logparse_utils as not-on-allowlist.
Teacher voice. Notice each gate is deterministic, na — a scanner, an allowlist, a boundary, not a human judgment. That's deliberate. The whole lesson of file 03 was that the trustworthy gate blocks on deterministic findings and advises on judgment ones. Governance is pure blast-radius, so it's pure deterministic gate: a license scanner doesn't get fatigued, a secret scanner doesn't miss the key in line 200, an allowlist doesn't get fooled by a plausible name. Human review stays for logic; the machine owns the three things humans can't see.
2) The blast-radius mental model — picture before the policy¶
This is the core mental model of the chapter. Keep it as the canonical ASCII image: AI usage sorted by blast radius, with oversight intensity scaling to match, and the governance failures sitting in the irreversible zone.
BLAST RADIUS of a wrong AI suggestion → OVERSIGHT NEEDED
low / reversible high / irreversible
┌────────────────┬─────────────────┬──────────────────┬───────────────────┐
│ TYPO / BUG │ REWORK │ LEAKED SECRET │ POISONED BUILD │
│ caught by │ caught by tests │ exfiltrated the │ malicious pkg in │
│ the compiler │ + review │ instant the │ the lockfile; │
│ │ (files 01,04) │ prompt is sent │ runs on install │
│ │ │ │ │
│ LICENSE-CLEAN │ │ DATA BOUNDARY │ DEPENDENCY │
│ inner loop │ │ before prompt │ ALLOWLIST before │
│ │ │ leaves │ install │
└────────────────┴─────────────────┴──────────────────┴───────────────────┘
reversible (catch downstream) │ IRREVERSIBLE (must gate BEFORE the action)
◀┘
DORA metrics see this side ────┘ DORA metrics are BLIND to this side
The whole danger is the right side: irreversible blast radius that no delivery metric shows. Rework and bugs live on the left — reversible, caught downstream, visible to DORA. Leaked secrets and poisoned builds live on the right — irreversible, must be gated before the action, invisible to DORA. The governance job is to put the strongest, earliest gates on the right side, where catching it afterward is too late. Meridian's mistake was treating the whole spectrum with one human PR gate designed for the left side.
3) Meridian writes the policy — the running example, with numbers¶
Meridian's platform and security teams write a model-in-the-loop policy and wire the gates. Watch the two approaches and what the guardrail does.
Attempt A — one policy, "review will catch it"¶
Policy: "AI-generated code goes through normal PR review." No data boundary,
consumer-tier model, no license/secret/dependency gates.
Result (over one quarter):
Secret-leak incidents: 2 (keys pasted into prompts → 3rd-party model)
License flags found later: 1 (GPL block shipped in a proprietary module)
Hallucinated-dep near-miss: 1 (engineer caught it by luck on install error)
Detection: all found AFTER the fact — one by a customer, one by an audit.
Attempt B — blast-radius-tiered policy with deterministic gates¶
DATA BOUNDARY:
- enterprise model with no-training + zero-retention contract
- prompt proxy strips/blocks secrets before the prompt leaves
- no proprietary algorithms or customer data in prompts (policy + DLP)
GATES (deterministic, in CI / at the boundary):
- duplicate-detection filter ON (suppress verbatim training-data matches)
- license scanner blocks copyleft-incompatible matches at merge
- secret scanning on prompts AND commits (block, not warn)
- dependency allowlist + "package exists & is the intended one" check
- lockfile pinning + SCA on every dependency change
TIERED OVERSIGHT (by blast radius):
- boilerplate/tests: green zone, light review
- auth/crypto/payments/IaC: red zone, mandatory human + security review
- anything touching secrets/PII: data-boundary enforced, no exceptions
Guardrail metric: secret/license/dependency incidents (target: 0 reaching prod)
Result (over one quarter):
Secret-leak incidents: 0 (proxy blocked 14 prompts carrying keys)
License flags: 0 reached prod (3 blocked at merge by scanner)
Hallucinated-dep: 0 installed (allowlist rejected 2 invented names)
Detection: all BEFORE the action, by a machine, not after by a customer.
The engineers didn't get more careful between A and B. The platform team moved detection from after-the-fact human review to before-the-action deterministic gates, set a data boundary so prompts can't carry secrets out, and tiered oversight to blast radius so the red zone (auth, crypto, IaC) gets human plus security review while boilerplate stays fast. The guardrail moved from "incidents found later" to "incidents blocked before the action," and the count reaching prod went to zero.
Teacher voice. See where the human judgment goes, na — not into reviewing every line for a GPL match (impossible) but into classifying blast radius and owning the red zone. The machine handles the three things it sees better than any human: licensed-code matches, secrets in text, and whether a package exists. The human owns the part that needs accountability and context — is this auth code, does this touch customer data, is this IaC that could take down prod. Same division as every chapter: the deterministic machine owns what it can verify, the human owns the judgment and the blast radius.
4) Why deterministic gates plus a data boundary, not "train developers" or "ban AI"¶
The plausible alternatives are training developers to be careful (awareness, not gates) and banning AI tools entirely (eliminate the risk by eliminating the tool). Why deterministic gates plus a data boundary under Meridian's workload?
Training developers fails for the same reason human review fails: the failure modes are invisible to a human in the moment. No amount of training lets an engineer recognize a verbatim GPL block they've never seen, remember not to paste a stack trace that happens to contain a key, or distinguish a hallucinated package name from a real one by reading it. Awareness reduces frequency; it cannot be the control for an irreversible blast radius, because one miss is a breach. Banning AI eliminates the governance risk but forfeits the leverage the whole module is about — and it doesn't even work, because developers route around bans with personal accounts and consumer tiers, which is worse (now the data boundary is gone and you can't see it).
Deterministic gates plus a data boundary take the only durable position: prevent the irreversible action mechanically (the boundary stops secrets leaving, the allowlist stops bad packages installing, the scanner stops licensed code merging) while keeping the leverage. Under a workload where the failure is irreversible and invisible-to-humans, only a machine gate at the point of action works; training is a complement, not a control, and a ban trades a manageable risk for a worse, invisible one. The cost is real — running a private registry, an enterprise model contract, scanners in CI — but it's the price of keeping the leverage without the irreversible exposure.
5) The property that changes the design: is the failure reversible or not¶
If you change one thing about how you govern AI coding, change this: the design variable is reversibility of the worst case. A reversible failure (rework, a bug) can be caught downstream cheaply, so you optimize for speed and let the gate be lighter. An irreversible failure (leaked secret, shipped license violation, installed malicious package) cannot be caught after the fact, so the gate must be before the action and cannot be a human who might miss it.
Failure Reversible? Where to gate Gate type
typo / logic bug yes downstream (tests) human + tests
rework yes downstream (review) human review
license contamination mostly no at merge, at generation scanner + filter
secret leakage NO at the prompt (before) boundary + secret scan
malicious dependency NO at install (before) allowlist + verify
Reversibility decides the gate's position (before vs after the action) and its type (machine vs human). The irreversible failures all share the property that the cost is incurred at the moment of the action — the prompt sent, the package installed, the code shipped — so the only effective control is a deterministic gate before that moment. This is why Meridian's data boundary strips secrets before the prompt leaves and the allowlist rejects packages before install: there is no after for an irreversible failure.
6) One failure walked through: the hallucinated package that became a supply-chain attack¶
Trace the slopsquatting failure end to end, because it's the canonical AI-specific supply-chain attack.
1. An engineer asks the assistant for code to parse a niche log format. The model
confidently writes `import logparse_utils` and uses it — a plausible-sounding
package that fits the context but does NOT exist. (~1 in 5 AI code suggestions
reference a package that doesn't exist; ~58% of hallucinated names recur across
runs, so they're predictable.)
2. An attacker, who runs the same popular models against common prompts and harvests
the recurring hallucinated names, registered `logparse_utils` on PyPI last week
with a post-install credential stealer. (This is slopsquatting.)
3. The engineer runs `pip install logparse_utils`. It resolves — the package exists
now. The install hook runs and exfiltrates the CI environment's secrets.
4. The code works (the malicious package also implements the parsing), tests pass,
it merges. The DORA metrics see a healthy change.
5. The breach surfaces weeks later when the stolen credentials are used. The lockfile
pins the malicious package; it's in every build until found and purged.
Where did the system fail? Not at the model being wrong about a package — that's expected; models hallucinate names. It failed at dependency resolution with no verification: the build installed an arbitrary name the model produced, trusting it the way it would trust a human-written import. The model's hallucination is predictable (the recurrence rate makes it an exploitable attack surface, not a random glitch), so the control is an allowlist or private registry that only resolves vetted packages, plus a check that the package existed before the AI suggested it. The blast radius — credential theft in CI — is fully irreversible by the time anyone notices.
The fix is the rule: gate the irreversible action (install) before it happens with a deterministic control (allowlist + existence verification), never trust an AI-produced package name as if a human vetted it.
7) Cost movement — what governance gates buy and bill¶
| What changes | Direction | Concrete (Meridian) | Who absorbs it |
|---|---|---|---|
| Irreversible-incident risk | falls sharply | 4 near-misses/qtr → 0 reaching prod | the business / legal |
| Secrets leaving the boundary | blocked at source | 14 prompts blocked before sending | security |
| License-contaminated merges | blocked at merge | 3 GPL matches caught by scanner | legal |
| Malicious-dependency installs | blocked at resolve | 2 invented names rejected | the build / CI |
| Developer friction | rises a little | red-zone code needs security review | red-zone authors |
| Infra cost | new, ongoing | private registry, enterprise model, scanners | platform + budget |
| Routing-around risk (if banned) | avoided | no shadow consumer-tier usage | the whole org |
The pressure relieved is irreversible legal and security exposure — the costs no DORA metric shows. The pressure created is a little developer friction (red-zone review, absorbed by authors of high-blast-radius code) and infra cost (private registry, enterprise model, scanners, absorbed by platform). The trade is strongly positive because one prevented breach or license claim dwarfs a quarter of scanner cost, and the gates barely touch the green-zone speed the rest of the module is about — oversight is tiered to blast radius, so boilerplate stays fast.
Mini-FAQ. "Our enterprise tier says it doesn't train on our data. Isn't the boundary handled?" Partly — a no-training, zero-retention contract closes the training-data leak, which is necessary. It does not close the prompt leak: a key pasted into a prompt still travels to the provider's servers and lives in logs and the provider's incident surface, contract or not. The data boundary needs both: an enterprise contract and a secret-scanning proxy that stops the secret entering the prompt in the first place. The contract governs what they do with it; the proxy governs whether it ever leaves you.
8) Signals — healthy, first to degrade, misleading, expert's graph¶
Healthy: zero secret/license/dependency incidents reaching prod; prompts carrying secrets blocked at the proxy (a rising block count is healthy — the gate is working); all dependencies resolving from the allowlist/private registry; red-zone changes carrying a security-review sign-off. The blocks are the gate doing its job, not a problem.
First metric to degrade: the rate of dependencies resolving from outside the allowlist (or developers requesting allowlist exceptions). It moves before any breach, because it's the behavior that makes a slopsquatting hit possible — every off-allowlist install is an unverified package, and the attack only needs one. Watch it the way file 03 watched the comment-dismiss reflex.
The misleading metric everyone watches: "number of AI policy training sessions completed" and "developers who acknowledged the policy." Pure compliance-theater vanity metrics — they rise with administration and say nothing about whether a secret left the boundary or a bad package installed. A 100%-trained org with no gates still leaks the key, because the failure is invisible to the trained human in the moment.
The graph an expert opens first: incidents by blast-radius tier over time, alongside gate-block counts (secrets blocked, licenses blocked, packages rejected). Healthy looks like high block counts and zero incidents reaching prod — the gates absorbing the risk. The danger signal is incidents reaching prod while block counts are low: the gates aren't wired where the risk is. Segment by tier to confirm the red zone (auth, crypto, IaC) carries human + security sign-off.
9) Boundary of applicability — where governance gates are strong, where pathological¶
Strong fit: organizations shipping proprietary software with real IP and security exposure, where the failure modes are irreversible and a deterministic gate can sit at the point of action (prompt proxy, license scanner, dependency allowlist). Here the gates are close to pure upside — they barely touch green-zone speed and they prevent the costs DORA can't see.
Pathological: applying maximum oversight uniformly to all AI usage — gating boilerplate and tests as hard as auth and IaC. That recreates the friction a ban would cause, drives developers to route around the gates (shadow consumer-tier accounts, which removes the data boundary entirely), and trains the dismiss reflex on the gates that matter. Uniform heavy gating is as broken as no gating, just in the opposite direction. Also pathological: a data-boundary policy with no enforcement (DLP/proxy) — a policy nobody can violate-proof is just a hope.
Scale/workload that breaks naive intuition: the intuition "the package the model suggested probably exists" inverts at scale. About one in five AI code suggestions references a non-existent package, and because the same hallucinated names recur across runs, they're a predictable, harvestable attack surface, not random noise — which is exactly what makes slopsquatting a viable attack rather than a curiosity. At the scale of a model used by thousands of developers, the recurring hallucinations become reliable bait. Never trust an AI-suggested package name the way you'd trust a human-vetted import.
10) Wrong assumption: "AI-generated code is the same as code I wrote, just faster"¶
The seductive belief is that AI output is just code — review it, test it, ship it like anything else, only faster. It reads like code you'd write and passes the same tests, so the same process should cover it. But AI code carries three liabilities your own code doesn't: it can reproduce someone else's licensed code, the act of producing it can leak your data, and it can reference dependencies that don't exist.
Replace the wrong belief with: AI-generated code has provenance and side-effects that human-written code doesn't — where it came from (possibly licensed training data), what its production leaked (prompt context to an external model), and whether its dependencies are real — and these are invisible in the diff, so the same review process is structurally blind to them. The provenance is the chapter's memory hook: the function looks identical to one you'd write, but it carries a license, a leak, and a phantom import that reading the code will never reveal — which is why the gate must be a machine that checks provenance, not a human who reads behavior.
11) Other failure shapes to recognize¶
- Verbatim license reproduction. A block large enough to be a near-copy of copyleft training-data code ships into proprietary software (the ~1% verbatim-match rate without a duplicate filter).
- Secret in the prompt. A pasted stack trace, config, or log carrying a live key/token travels to a third-party model and into its logs — irreversible the instant it sends.
- Slopsquatting / hallucinated dependency. A model-invented package name (≈1 in 5 suggestions) gets squatted by an attacker and installed with a malicious payload.
- Proprietary code as context. Pasting your core algorithm into a consumer-tier model to "explain" or "refactor" it leaks the IP outside the boundary.
- Shadow AI / routing around a ban. A ban pushes usage to personal accounts and consumer tiers, removing the data boundary and the audit trail entirely — worse than governed usage.
- Indirect prompt injection in the agent. Hidden instructions in a fetched page or repo (CamoLeak-class, CVE-2025-59145) steer a coding agent into exfiltrating source or secrets.
- License-incompatible mix. AI suggests a snippet under a license incompatible with your distribution model (GPL into a closed-source product), creating a contamination claim.
- PII in training/feedback loops. Customer data pasted into prompts ends up in a fine-tuning or feedback dataset, a data-residency and privacy violation.
- Compliance theater. A signed policy with no enforcing gate — everyone acknowledged it, nothing blocks the violation.
12) Pattern transfer — where this pressure recurs¶
- The blast radius is the file-02 invariant taken to its extreme: oversight scales with what a wrong action breaks, and here the actions (send prompt, merge code, install package) have irreversible blast radius, so the gate moves before the action — the same logic as gating an IaC apply (file 02) and auto-remediation (file 05), now for legal and security cost.
- The deterministic gate is the file-03 trust mechanism applied to governance: block on what a machine can verify (license match, secret pattern, package existence), advise on judgment — except governance is all deterministic, because the failure modes are exactly the ones humans can't see in a diff.
- The data boundary is the source-of-truth idea inverted: instead of grounding output in a trusted source, you prevent trusted inputs (secrets, proprietary code) from leaving the boundary — the same ownership of what's yours that the spec and oracle protected.
- The amplifier rule recurs: AI amplifies a weak supply-chain and secrets posture as surely as it amplifies weak tests (file 06). An org with no allowlist and no secret scanning gets those weaknesses amplified into breaches; one with strong gates gets the leverage safely.
13) Design test — five questions before letting AI code reach prod¶
- What's the worst irreversible thing this AI usage could cause — a leak, a license claim, a poisoned build — and is there a gate before that action?
- Can a secret or proprietary code reach a model outside my boundary, and what stops it before the prompt sends?
- Are all dependencies resolved from an allowlist/private registry, with a check that each package actually existed before the AI suggested it?
- Is oversight tiered to blast radius — boilerplate light, auth/crypto/IaC/PII heavy — or applied uniformly (which gets routed around)?
- Is the governance control a deterministic machine gate, or a human review / training session that's blind to provenance and irreversible by the time it'd catch anything?
Where this appears in production¶
- GitHub Copilot duplicate-detection filter — checks suggestions against a 150-character window of public code and suppresses matches; without it, ~1% of suggestions match training data verbatim — the IP gate at generation.
- Doe v. GitHub (Copilot litigation) — the open-source license / IP class action (Saveri Law Firm, Matthew Butterick); the legal blast radius of license contamination made concrete, with breach-of-license claims surviving 2024 dismissals.
- GitHub Advanced Security / secret scanning + push protection — blocks commits containing detected secrets; the secret-leak gate at the commit boundary.
- GitGuardian / TruffleHog — secret detection across repos and (increasingly) prompts; the data-boundary enforcement layer.
- Snyk / Socket / Endor Labs — software composition analysis and dependency risk; Socket specifically detects slopsquatting and malicious newly-published packages — the supply-chain gate.
- Slopsquatting research (USENIX Security 2025) — 2.23M generated samples, 19.7% referencing a hallucinated package, 205K unique fabricated names, ~58% recurring across runs; the data that makes hallucinated dependencies a real attack surface.
- CamoLeak (CVE-2025-59145, CVSS 9.6) — invisible-markdown prompt injection in Copilot Chat exfiltrating secrets via image URLs; the indirect-injection blast radius for coding agents.
- FOSSA / Black Duck / Snyk License Compliance — license scanning in CI; the copyleft-contamination gate at merge.
- Private package registries (Artifactory, AWS CodeArtifact, GitHub Packages, npm/PyPI org proxies) — the dependency allowlist that resolves only vetted packages, defeating slopsquatting.
- Enterprise model contracts (GitHub Copilot Business/Enterprise, Anthropic / OpenAI enterprise, Azure OpenAI) — no-training, zero-retention terms; the contractual half of the data boundary.
- Microsoft Purview / DLP — data-loss-prevention enforcing what can leave the boundary, including into AI prompts.
- EU AI Act / SOC 2 / data-residency regimes — the compliance backdrop that turns a data-boundary breach into a regulatory and audit liability.
Pause and recall¶
- Why is governance a different kind of cost than rework, and why is DORA blind to it?
- Name the three AI-coding blast radii and where each one's gate must sit.
- Why can't human PR review be the governance control for these failure modes?
- What does "reversibility of the worst case" decide about a gate's position and type?
- Walk through how a hallucinated package name becomes a real supply-chain attack.
- Why is banning AI worse than governing it, and why is uniform heavy gating also broken?
- Why does a no-training enterprise contract not fully close the data boundary?
- Which behavior degrades first and makes a slopsquatting hit possible, before any breach?
Interview Q&A¶
Q1. An AI-generated function works and passes tests. Why might shipping it still be a liability? A. Because AI code carries provenance and side-effects human code doesn't, all invisible in the diff: the block may be a near-verbatim reproduction of licensed training-data code (a license claim), the prompt that produced it may have carried a secret to an external model (an irreversible leak), and its imports may be hallucinated package names an attacker has squatted (a poisoned build). None show up in tests or DORA metrics; each needs a deterministic gate aimed at that failure mode. Common wrong answer to avoid: "If it works and passes tests and review, it's fine." Review and tests are blind to provenance — they check behavior, not where the code came from, what its creation leaked, or whether its dependencies are real.
Q2. Why not just train developers to be careful with AI tools instead of building gates? A. Because the failure modes are invisible to a human in the moment: an engineer can't recognize a verbatim GPL block they've never seen, can't reliably avoid pasting a stack trace that happens to contain a key, and can't tell a hallucinated package name from a real one by reading it. Training lowers frequency but can't be the control for an irreversible blast radius, where one miss is a breach. Use deterministic gates at the point of action; training is a complement. Common wrong answer to avoid: "Awareness training fixes it." Awareness reduces frequency, not the irreversible tail; the failures are invisible to the trained human exactly when they happen.
Q3. A developer's AI suggestion imports a package nobody recognizes. Walk me through the risk. A. About one in five AI code suggestions references a package that doesn't exist, and because the same hallucinated names recur across runs, attackers harvest them and register those exact names with malicious payloads — slopsquatting. If your build installs an arbitrary AI-suggested name, the malicious post-install hook runs and exfiltrates CI secrets, and the lockfile pins it into every build. The gate is a private registry/allowlist that only resolves vetted packages plus a check that the package existed before the AI suggested it. Common wrong answer to avoid: "If pip can install it, it's a real package." It's real because an attacker just registered the name the model invented; resolving successfully is the attack succeeding, not safety.
Q4. Leadership wants to ban AI tools to eliminate IP and security risk. Good idea? A. No — it forfeits the leverage and doesn't even work: developers route around bans with personal/consumer-tier accounts, which removes the data boundary and the audit trail, making the risk worse and invisible. The durable position is deterministic gates plus a data boundary (enterprise no-training model, secret-scanning proxy, license scanner, dependency allowlist) with oversight tiered to blast radius, keeping the leverage while preventing the irreversible failures. Common wrong answer to avoid: "Banning it removes the risk." It relocates the risk to ungoverned shadow usage with no boundary and no visibility — strictly worse than governed usage.
Q5. Your enterprise model contract says no training on your data. Are secrets safe in prompts now? A. No. The contract closes the training-data leak but not the prompt leak: a key pasted into a prompt still travels to the provider's servers and into logs, contract or not, and lives in their incident surface. You need both — the no-training/zero-retention contract and a secret-scanning proxy that blocks the secret from entering the prompt in the first place. The contract governs their handling; the proxy governs whether it ever leaves your boundary. Common wrong answer to avoid: "The contract says they won't train on it, so we're covered." Not training on it isn't the same as it never leaving your boundary; the leak happens at send, and the contract doesn't un-send it.
Q6. We have a secret leak via an AI prompt — is this a file-05 grounding problem, a file-06 measurement gap, or a file-07 governance problem? (cumulative) A. It's a file-07 governance problem: an irreversible data-boundary failure, invisible to DORA (file 06's blind spot is exactly this class of cost) and unrelated to grounding (file 05 was about ungrounded output; this is leaked input). The shared shape with the rest of the module is the blast radius — oversight must scale to what a wrong action breaks — but here the action is irreversible and the gate must sit before the prompt sends, not in review afterward. Common wrong answer to avoid: "Tighten review and re-measure." Review is blind to it and the leak already happened; the fix is a before-the-action boundary gate, and no metric will show the cost until it's a breach.
Design/debug exercise (10 min)¶
Step 1 — Modeled example. Here is Meridian's model-in-the-loop policy, tiered by blast radius:
DATA BOUNDARY (always on):
enterprise model (no-training, zero-retention) + secret-scanning prompt proxy
no proprietary algorithms / customer PII in prompts (DLP-enforced)
DETERMINISTIC GATES (in CI / at the boundary, block not warn):
duplicate-detection filter ON; license scanner at merge
secret scanning on prompts AND commits
dependency allowlist + package-exists check + lockfile pin + SCA
TIERED OVERSIGHT:
green (boilerplate, tests): light review, gates only
red (auth, crypto, payments, IaC): mandatory human + security sign-off
secrets/PII-touching: data boundary enforced, no exceptions
GUARDRAIL: secret/license/dependency incidents reaching prod = 0.
Forbidden: trusting an AI-suggested package name; pasting secrets/IP into a prompt;
uniform heavy gating that gets routed around.
Step 2 — Your turn. Take your own AI rollout (or continue Meridian's). For one workflow, name the worst irreversible thing a wrong suggestion could cause, the gate that sits before that action, and which blast-radius tier it belongs in. Then identify one place a secret or proprietary code could currently reach a model outside your boundary, and the gate that would stop it.
Step 3 — Reproduce from memory. Redraw the blast-radius spectrum (reversible/left vs irreversible/right), mark which side DORA can see and which side needs a before-the-action gate, and place the three governance failures. Then connect it to file 02: why is "oversight scales with blast radius" the same invariant here as it was for gating an IaC apply?
Operational memory¶
This chapter explained the class of cost that no delivery metric shows until it's a headline: AI coding can reproduce licensed code into proprietary software, carry secrets and proprietary code out of your boundary through prompts, and reference hallucinated package names that attackers squat — failures that are external, often irreversible, and invisible in a diff. The important idea is that oversight must scale with blast radius, and these governance failures have an irreversible blast radius, so the control is a deterministic gate before the action — not human review, which is structurally blind to provenance — not that "AI code is just code, faster."
You learned to classify AI usage by blast radius, set a data boundary (enterprise no-training model plus a secret-scanning proxy) so prompts can't carry secrets out, gate code with deterministic machines (duplicate-detection filter, license scanner, dependency allowlist with existence checks), and tier oversight so the red zone (auth, crypto, IaC, PII) gets human plus security sign-off while boilerplate stays fast. That solves the opening failure because the GPL block is caught at merge, the pasted key is blocked before the prompt sends, and the hallucinated package is rejected before install — moving detection from after-the-fact-by-a-customer to before-the-action-by-a-machine, with the guardrail being incidents reaching prod.
Carry this diagnostic forward: before letting AI code reach prod, ask what irreversible thing a wrong suggestion could cause and whether a gate sits before that action. If a metric won't show the cost until it's a breach or a lawsuit, the control has to be a deterministic gate at the point of action, not a human review or a training session that's blind by the time it would catch anything.
Remember:
- AI code carries provenance and side-effects human code doesn't — licensed origin, leaked prompt context, phantom imports — all invisible in the diff.
- Governance failures are irreversible (leaked secret, shipped license violation, installed malicious package), so the gate must sit before the action.
- Human review can't be the control; it's blind to provenance, so detection must be a deterministic machine (scanner, allowlist, boundary).
- Slopsquatting is real: ~1 in 5 AI suggestions reference a non-existent package, the names recur, and attackers squat them — never trust an AI package name.
- A no-training contract closes the training leak, not the prompt leak; you need a secret-scanning proxy too.
- Tier oversight to blast radius — light on boilerplate, heavy on auth/crypto/IaC/PII — or uniform heavy gating gets routed around into ungoverned shadow usage.
Bridge. We've now covered the full lifecycle — the inner loop, scaffolding, review, tests, ops, measurement, and the legal and security blast radius — and each chapter resolved one clean pressure with one clean gate. But the real world is messier than any single chapter let on. The evidence on whether AI helps is genuinely contested, the studies contradict each other, the tools change every quarter, and practitioners do things that violate the textbook advice and still ship. The final file steps back into that ambiguity: where the evidence is contested, where hype outruns reality, what works empirically without a clean theory, and what to revisit as the tools and studies keep moving. → 08-boundary-tradeoff-review.md