Skip to content

06. Security, governance, and compliance — the gates that override the rubric

~18 min read. The rubric cleared the platform, the cost curve held, the escape hatch was open. Then the bank's security review asked one question — "where does the conversation data physically rest, and can you prove it?" — and a platform that won on every other axis was eliminated in a single meeting.

Built on 05-cost-and-scaling-model.md. You can now score capability, exit cost, and the cost curve. This file adds the dimension that overrides all three: the procurement gate. The pressure is safety vs speed — and unlike the weighted capability rubric, a governance gate is not a score you weigh. It is a pass/fail that kills a platform before a single agent ships, regardless of how well it scored everywhere else.


What every quantitative pass assumed away

The rubric weighted nine capabilities. The lock-in pass scored exit cost. The cost pass projected the bill to 100×. Each produced a number you weigh against other numbers. All of them quietly assumed the platform was allowed — that legal, security, and the regulator would let the agent ship at all. That assumption is where platform decisions die most abruptly, because a governance failure is not a low score you can offset with a high score elsewhere. It is a gate. The platform either passes or it is out.

This file teaches the gates: where data physically rests (residency), how personal data is handled (PII), how one customer's data stays separate from another's (tenant isolation), who can prove what happened (audit), and who actually makes the model call (the ownership question that decides liability). You will learn to run these as pass/fail filters before the rubric, not as axes inside it — because a platform that fails a gate should never reach the scoring stage.

What this file solves

A team scores a platform, picks it, builds for two months, and then a procurement, security, or legal review surfaces a hard requirement — data must rest in-region, PII must be redacted before it reaches the model, each tenant's data must be provably isolated, every automated decision must produce a signed audit trail — that the platform structurally cannot meet. The build is wasted, the timeline slips, and the team learns that the gate was knowable on day one. This file gives you the five gates as pre-filters, a worked walk of the bank's procurement review, and the rule that a gate overrides any rubric score. The first move is to stop treating security as a capability axis to weigh and start treating it as a filter that runs first, because no amount of capability buys back a residency violation.

Why folding security into the rubric is a category error

The tempting design is to add "security & compliance" as a tenth axis on the capability rubric, weight it, score it, and sum. This is a category error, and a dangerous one, because a weighted sum is compensatory — a high score on one axis offsets a low score on another. That is exactly the wrong logic for a gate. A platform that scores 5 on capability, 5 on cost, and 1 on data residency has a weighted total that looks healthy and a residency violation that is illegal. The sum launders a disqualifier into an acceptable average. Gates are non-compensatory: failing one is fatal regardless of every other score. They cannot live inside a sum.

So the question is not "how do we weight security against capability." It is "which requirements are gates that a platform must pass to even be scored, and which are capabilities we weigh after the gates clear." Sorting requirements into gates and axes — before scoring anything — is the skill.

When the best-scoring platform is illegal to use

Run the bank's procurement review on a candidate that aced the rubric. Suppose a Vertex-style runtime had topped the bank's weighted total (it did, at 140 in file 03). Procurement asks five questions:

PROCUREMENT GATE REVIEW — candidate that scored highest on the rubric

1. RESIDENCY: Does all conversation data, memory, and state rest in ap-south-1
   (Mumbai), provably, for the regulator?
   → State can be region-pinned, BUT the bank's core-banking data is on AWS;
     reaching it means cross-cloud egress, and the residency proof now spans
     two clouds' audit trails. PARTIAL — and the regulator wants a single
     provable boundary.                                          ⚠ GATE AT RISK

2. PII: Can PII be redacted/masked BEFORE it reaches the model, and is the
   redaction itself auditable?
   → Yes (guardrail-style PII filtering exists).                 ✓ PASS

3. TENANT ISOLATION: Is each customer's data provably isolated, with no
   cross-tenant leakage path through shared memory or guardrails?
   → Yes, with per-tenant configuration.                         ✓ PASS

4. AUDIT: Is every model call and automated decision logged immutably and
   signable with the BANK'S key for the regulator?
   → Logging yes; signing with the bank's own KMS key — depends on owning the
     model call, which the runtime allows since orchestration is owned.  ✓ PASS

5. WHO OWNS THE MODEL CALL: Does the bank control the inference call, or does
   the vendor make it on the bank's behalf?
   → Bank owns it (orchestration owned).                         ✓ PASS

Four of five pass. The fifth — residency with a single provable boundary — is at risk specifically because the bank's data lives on AWS and this candidate is on another cloud, so the residency proof fragments across two clouds. So the real reason this platform is eliminated is not a missing feature or a low score; it is that a single hard gate — provable single-boundary residency — cannot be met, and a gate is pass/fail. The rubric's 140 is irrelevant. A platform that is illegal or unprovable to a regulator does not get to compete on capability.

So how do you avoid wasting two months building on a platform that a gate will kill? You run the gates as pre-filters before the rubric, sort every requirement into gate-or-axis up front, and eliminate any platform that fails a gate before scoring the survivors.

The gate rule. Security, residency, PII, isolation, and audit requirements are gates, not axes. Run them as pass/fail pre-filters before the capability rubric. A platform that fails any gate is eliminated regardless of capability, cost, or portability — gates are non-compensatory and override every weighted score.

Why this rule exists. The primitive is that some requirements are legal or contractual constraints, not preferences — a residency violation is illegal, not merely undesirable. The constraint is that a weighted sum is compensatory by construction: it lets a high score offset a low one, which is exactly the logic a gate forbids. Folding a gate into the rubric mathematically permits trading capability for legality, so the gate must live outside the sum, as a filter that runs first.


1) The five gates — what each one filters out

These are the pass/fail filters every regulated agentic platform must clear. For each, the question is binary: can the platform meet it, provably, or not.

1. Data residency. Where conversation data, memory, embeddings, and logs physically rest, and whether you can prove it to a regulator with a single boundary. The filter: does state stay in the required region (ap-south-1 for the bank), and is the proof a single audit trail, not a fragment spread across clouds? This is the gate that fragments when your data and your platform are on different clouds — the file-02 gravity force showing up as a compliance filter.

2. PII handling. Whether personally identifiable information can be detected, redacted, or masked before it reaches the model and any third party, and whether the redaction is itself auditable. The filter: can you guarantee a customer's PAN, account number, or health detail never reaches a model uncontrolled, and prove the guarantee? Bedrock Guardrails, for example, filters or masks PII in inputs and outputs — but the gate is whether your required PII policy is enforceable and provable, not whether a feature exists.

3. Tenant isolation. Whether one customer's (or one business unit's) data is provably separated from another's, with no leakage path through shared memory, shared guardrails, or a shared model context. The filter: can you prove customer A's conversation can never surface in customer B's, even through a shared long-term memory store? Multi-tenant agents leak exactly here — through shared state the rubric scored as a feature.

4. Audit and immutability. Whether every model call and automated decision is logged immutably, attributable to a specific agent version and policy, and signable for the regulator. The filter: can you reconstruct and prove what the agent did, when, on whose behalf, with which model — and sign that record with your own key? This is where the file-04 "own the data source of truth" hatch becomes a compliance requirement, not just a portability one.

5. Who owns the model call. Whether you make the inference call or the vendor makes it on your behalf — which decides liability, signability, and control. The filter: when the agent decides to deny a loan or issue a refund, whose system invoked the model, and can you sign that call with your key? A SaaS that owns the model call owns the liability boundary and forecloses your-key signing; an owned orchestration on a runtime keeps the call yours.

GATE                 BINARY FILTER (pass/fail)                    KILLS A PLATFORM WHEN
──────────────────   ──────────────────────────────────────────  ────────────────────────────
Data residency       state rests in-region, single provable      data + platform on different
                     boundary                                      clouds → fragmented proof
PII handling         PII redacted before model, redaction         no enforceable, provable PII
                     auditable                                     policy at the model boundary
Tenant isolation     no cross-tenant leakage path through         shared memory/guardrails leak
                     shared state                                  across tenants
Audit/immutability   every call+decision logged immutably,        vendor owns logs; no your-key
                     signable with your key                        signing; logs not exportable
Who owns model call  YOU invoke inference, not the vendor          vendor makes the call → vendor
                                                                    owns liability + signability

   gates run FIRST, as a sieve. survivors get scored on the rubric. a gate FAILURE
   is fatal — no rubric score buys it back.

The diagram is the file. Five sieves run before any scoring. A platform that doesn't pass all five gates for your regulatory context never reaches the capability rubric.


2) Picture first — the gate sieve before the weighted rubric

flowchart TD
    A[Candidate platforms] --> B{Gate 1: residency<br/>provable single boundary?}
    B -->|fail| X[ELIMINATED — no score buys it back]
    B -->|pass| C{Gate 2: PII redacted<br/>before model, auditable?}
    C -->|fail| X
    C -->|pass| D{Gate 3: tenant isolation<br/>no leakage path?}
    D -->|fail| X
    D -->|pass| E{Gate 4: audit immutable<br/>+ your-key signable?}
    E -->|fail| X
    E -->|pass| F{Gate 5: you own<br/>the model call?}
    F -->|fail| X
    F -->|pass| G[Survivors go to the rubric file 03<br/>then cost file 05, lock-in file 04]

The shape is a sieve, not a scale. Each gate is a yes/no fork, and a single "fail" sends the platform straight to elimination — there is no path where a strong downstream score rescues it. Only platforms that pass all five gates reach the weighted rubric. This is the structural inversion of the rubric: the rubric is compensatory (weigh and sum), the gates are non-compensatory (pass all or out). Running the sieve first is what stops the team from wasting two months scoring and building on a platform a gate would kill.


3) The bank's procurement review — one running example

The bank runs the sieve on its real candidates before the rubric ever runs. Walk both agents through.

Attempt A — the tempting move: score first, security-review later

The bank's first plan: run the capability rubric, pick the winner, then send it to security for sign-off near launch. This is the trap. Security review near launch finds a residency or audit gate the platform can't meet, and two months of build are wasted because the gate was a day-one fact treated as a launch-day formality.

Attempt B — the right move: run the gate sieve first, score the survivors

The bank inverts the order. Procurement and security define the five gates before the rubric runs, from the regulator's requirements and the bank's data-handling policy.

Support agent gates:

Gate 1 RESIDENCY:  All conversation data + memory in ap-south-1, single provable
                   boundary. → AgentCore: runtime + memory pin to ap-south-1, and
                   core-banking data is ALSO on AWS → single-cloud proof. PASS ✓
                   (A cross-cloud candidate fragments the proof → FAILS this gate.)
Gate 2 PII:        Mask PAN/account/Aadhaar before model; redaction logged.
                   → Bedrock Guardrails PII filter, applied pre-model, logged. PASS ✓
Gate 3 ISOLATION:  Each customer's conversation provably isolated; no leak through
                   shared memory. → per-customer memory scoping enforced. PASS ✓
Gate 4 AUDIT:      Every model call signed with bank's KMS key, logged immutably,
                   attributable to agent version. → owned orchestration invokes the
                   model and signs with bank KMS; logs dual-written (file 04). PASS ✓
Gate 5 MODEL CALL: Bank owns the inference call. → owned LangGraph orchestration
                   makes the call; vendor never calls on the bank's behalf. PASS ✓

All five pass for AgentCore because the bank's data is on AWS (residency single-boundary) and its orchestration is owned (audit + model-call ownership). The same gates fail a cross-cloud runtime on residency and fail a closed SaaS on model-call ownership and your-key signing — which is why file 01's wall was, underneath, a governance-gate failure the bank never ran.

Internal-ops agent gates are stricter: KYC documents and compliance memos are the most sensitive data the bank holds. Residency is hard-pinned (and may demand the sovereign region from file 04), PII redaction is mandatory on every document, isolation is per-relationship-manager and per-customer, and audit must capture every document accessed and every memo drafted. The internal agent passes the same gates on AgentCore but with tighter configuration — and the residency gate is the one that later changes (file 04's sovereign-region migration), which is exactly why the audit hatch (owned, signed, exportable logs) had to exist before the change.

Verdict: the gate sieve eliminates cross-cloud and closed-SaaS candidates before the rubric runs, leaving AgentCore (and self-host) as the only candidates worth scoring. The rubric (file 03) then scored the survivors. The gates didn't pick AgentCore; they cleared the field of illegal options so the rubric, cost, and gravity could decide among the legal ones.

Teacher voice. Notice the order. The bank ran gates first, rubric second. Had it scored first, a cross-cloud runtime would have topped the rubric (140), been picked, and then died in the security review two months in. Running the non-compensatory sieve before the compensatory score is the whole discipline — it eliminates the illegal before you fall in love with the capable.


4) Why a sieve and not a weighted axis — under this workload

The plausible alternative is to add security as a heavily-weighted rubric axis — weight it 5, score it, let it dominate the sum. Why a separate sieve instead of just a high weight?

Because a high weight is still compensatory, and a regulatory gate is not. Weight security at 5 and score a cross-cloud candidate 2 on residency: it loses 12 points (4×... wait, 5×... say 15 points) off its total — but if it scores 5 on the eight other axes, it can still top the rubric. A 15-point penalty on a ~140-point scale does not eliminate it; it merely handicaps it. And a residency violation is not a handicap — it is illegal. No weight high enough to guarantee elimination exists inside a sum without distorting every other comparison, because the sum's whole job is to let scores trade off. The only structure that guarantees "fail this and you're out, period" is a filter outside the sum.

For the bank — a regulated lender where a residency or audit failure is an enforcement action, not a degraded experience — the gate must be uncrossable. A sieve makes it uncrossable; a weight, however high, leaves a path where enough capability buys back a violation. The sieve is the correct structure precisely because the requirement is binary and legal, not graded and preferential. If the requirements were soft preferences (nice-to-have SOC2, preferred-but-not-required region), a weighted axis would be fine — gates are for the hard, legal, binary constraints.


5) The property that decides the gates: who physically holds the data and makes the call

The single dimension that most determines whether a platform passes the gates is ownership of the data and the model call — which is the file-01 boundary and the file-04 data surface, reappearing as a compliance filter. Three of the five gates (residency, audit, model-call ownership) are decided by where the data rests and who invokes inference, not by any agent feature.

                       data rests where?        who makes the model call?
SaaS vertical          vendor's cloud           vendor (on your behalf)
  → residency: vendor's region, hard to prove single-boundary
  → audit: vendor owns logs, your-key signing usually impossible
  → model call: vendor owns it → vendor owns liability        ⚠ fails gates 1,4,5 often

Hyperscaler runtime    your cloud region        YOU (owned orchestration)
  → residency: pin to region; single-boundary if data is same cloud
  → audit: you invoke + sign with your key; logs dual-written
  → model call: you own it → you own liability + signability  ✓ passes when same-cloud

Framework self-host    your infra               YOU, everywhere
  → residency: wherever you deploy; you control it fully
  → audit: you own every log and key                          ✓ passes (you own ops too)

This is the counterintuitive landing: the same ownership boundary that file 01 framed as control, file 04 framed as portability, and file 02 framed as gravity is, at the procurement gate, the thing that decides legality. A closed SaaS that owns the model call cannot sign with your key or keep the call yours — so it fails the audit and model-call gates structurally, not by oversight. The boundary you chose for control turns out to be the boundary that passes or fails compliance.


6) The gate failure walked through deeply — residency discovered at launch

Replay the failure the bank avoided, because its shape recurs across every team that scores before it sieves.

Month 0   Team runs the capability rubric. Cross-cloud runtime tops it (140).
          Picks it. Security sign-off scheduled "for closer to launch."
Month 1-2 Build proceeds. Orchestration, tools, memory all wired. Demo great.
Month 2   Security + legal review (the gate, run late):
            - residency: conversation data and memory rest in the runtime's
              cloud; core-banking data is on AWS. Residency proof spans two
              clouds' audit trails. Regulator requires a single provable
              boundary. → FAIL.
            - audit: model call ownership is fine, but the cross-cloud egress
              path is extra attack surface the security team won't sign.
Month 3   Decision: the platform CANNOT ship for this regulated use case.
          Two months of build wasted. Re-run gates → AgentCore passes →
          rebuild on AgentCore. Timeline slips a quarter.

The failure is not that the platform was bad — it scored highest. The failure is ordering: the team ran the compensatory rubric before the non-compensatory sieve, so a gate that was a knowable day-one fact (data lives on AWS; cross-cloud fragments residency proof) surfaced as a launch-blocker two months in. Not a platform-capability problem; a gate-ordering problem. The fix is structural and free: run the five gates before the rubric, eliminate gate-failures first, and only score what's legal.

Mini-FAQ. "Can't we mitigate a residency gate with encryption and a data-processing agreement?" Sometimes a soft requirement, never a hard one. Encryption and a DPA can satisfy a contractual preference, but a regulator's hard residency rule is about where data physically rests and whether you can prove it with a single boundary — encryption in the wrong region is still data in the wrong region. Mitigations move soft gates; they don't move a hard legal boundary. Sort the gate as hard or soft first, then decide if a mitigation even applies.*


7) Cost movement — what running gates first saves and costs

Approach When gates run What it catches What it wastes When it bites
Security as a launch formality after build nothing early 2+ months of build on a gate-failing platform every regulated use case
Security as a rubric axis inside scoring handicaps but doesn't eliminate lets capability buy back a violation when a high score offsets a gate fail
Gate sieve before the rubric first illegal/unprovable platforms, before scoring a day of upfront gate-definition work almost never bites; this is the fix
Gates + owned data/call + your-key audit first + by design residency, audit, liability, all provable continuous ownership cost (file 04 hatch) only if you over-engineer a low-risk agent

Read it as a movement. The cost of running gates first is a day of work to define the five gates from the regulator's requirements; the cost of running them last is two months of wasted build plus a quarter's slip. The subsystem that pays for doing it right is procurement and security's upfront time — and what it buys is never building on an illegal platform. For the bank, the gate-definition day was the cheapest insurance in the entire evaluation, because it eliminated the cross-cloud candidate before anyone wrote code against it.


8) Operational signals — are the gates being run correctly?

Healthy. The five gates are defined from the regulator's and legal's hard requirements before the rubric runs, every candidate is sieved to pass/fail before scoring, and security/legal sign off on the survivors at selection, not at launch. No platform reaches the build stage without clearing all gates.

First metric that degrades. The lateness of the security review relative to the decision. When security sign-off slides toward launch instead of selection, gates are being treated as formalities, and a gate-failure is being discovered after build. Track "days between platform selection and security gate review" — healthy is negative or zero (review before selection); positive-and-growing means gates are running last.

Misleading metric people watch. The capability rubric total — the same number file 03 warned about. A high total feels like a green light, but it says nothing about whether the platform is legal to use; a 140 with a failed residency gate is a platform that cannot ship. People quote the score as approval; it's only capability among the already-legal.

The signal an experienced lead checks first. "Has every hard residency, PII, isolation, audit, and model-call requirement been sorted into a pass/fail gate, and has every candidate been sieved before scoring?" A lead reads the gate sieve before the rubric and refuses to discuss capability for a platform that hasn't cleared the gates — because capability is meaningless for a platform that's illegal.


9) Boundary of applicability — when gates dominate and when they barely apply

Strong fit for gate-first evaluation. Regulated domains — banking, healthcare, insurance, government — where residency, PII, isolation, and audit are legal requirements with enforcement teeth. The bank is the textbook case: a residency or audit failure is an enforcement action, so the gates must run first and override every score.

Where gate obsession becomes pathological. A low-stakes internal tool with no PII, no regulatory exposure, and no tenant boundary — an engineering-team helper agent over public docs. Defining five hard gates and demanding your-key signing for an agent that touches nothing sensitive is wasted process that slows a harmless tool. Match gate rigor to the data sensitivity and regulatory exposure; not every agent needs the bank's sieve.

Scale / regime that invalidates the intuition. "Security is a capability we can weigh" holds for soft preferences and breaks for hard legal constraints — the moment a requirement is legal-binary rather than graded-preferential, it must leave the rubric and become a gate. The intuition that breaks most often is treating the highest-scoring platform as the choice; in a regulated domain the highest-scoring platform is frequently the one that fails a gate, because capability and compliance are independent and the rubric measures only the first.


10) The wrong model to drop: "security is the most important axis on the rubric"

The seductive wrong idea is that security is just a very important capability — so weight it highest and let it dominate the sum. This is wrong because importance and structure are different. A heavily-weighted axis is still compensatory: enough capability elsewhere can offset a low security score, which mathematically permits trading legality for features. A gate forbids that trade entirely. Security's hard requirements are not the most important axis; they are not an axis at all — they are a filter that runs before scoring.

Replace "weight security highest" with "sort requirements into hard gates (sieve first, non-compensatory) and soft preferences (weigh in the rubric)." The bank's residency requirement is a gate, not a weight-5 axis; a closed SaaS's inability to sign with the bank's key is a gate failure, not a low audit score. The structure of the requirement — binary-legal vs graded-preferential — decides whether it's a gate or an axis, and getting that sort right is the chapter.


11) Other governance failure shapes

  • Gate-last ordering — running the rubric before the sieve, so a gate-failure surfaces after two months of build.
  • Compensatory laundering — folding a hard gate into the weighted sum so a high capability score offsets a legal violation.
  • Residency proof fragmentation — data and platform on different clouds, so the residency audit trail spans two clouds and can't show a single boundary.
  • Shared-memory tenant leak — a multi-tenant agent where customer A's data surfaces in customer B's through a shared long-term memory store the rubric scored as a feature.
  • Vendor-owned model call — a SaaS that invokes inference on your behalf, foreclosing your-key signing and shifting the liability boundary to the vendor.
  • Audit logs you can't export — immutable logs locked in the vendor's console, unprovable to a regulator and un-exportable (the file-04 data surface as a compliance failure).
  • PII reaching the model uncontrolled — no enforceable redaction at the model boundary, so a PAN or health detail flows to a third-party model.
  • Mitigation theater — applying encryption and a DPA to a hard residency gate that physical location, not encryption, decides.

12) Pattern transfer — where gate-vs-axis thinking recurs

  • Build-vs-buy (file 01) — the bank's month-eight wall (no your-key signing, no in-VPC calls) was, underneath, a governance-gate failure on a closed SaaS; the boundary that blocked features was the boundary that fails the audit and model-call gates.
  • Lock-in / own the data (file 04) — the "own your data source of truth" hatch is also the audit gate's requirement: exportable, signable, immutable logs are both a portability hatch and a compliance gate.
  • Data gravity (file 02) — the same gravity that picks a platform by where your data lives is the residency gate: data on a different cloud than the platform fragments the residency proof.
  • Model vendor strategy (module 12) — "who owns the model call" is also a model-vendor question: a vendor-hosted model API vs your-own-deployment decides signability and liability one layer down.
  • Access control / multi-tenancy in classic systems — tenant isolation is the same non-compensatory gate as row-level security in a shared database; a leak path is fatal regardless of feature richness.

13) The governance audit — five yes/no questions

  1. Have you sorted every requirement into a hard gate (pass/fail, runs first) or a soft preference (weighed in the rubric), by whether it's legal-binary or graded-preferential?
  2. Do the five gates — residency, PII, isolation, audit, model-call ownership — run before the capability rubric, eliminating failures before scoring?
  3. For residency, can you prove a single boundary, not a proof fragmented across two clouds?
  4. Do you own the model call and can you sign it with your own key, so audit and liability are yours?
  5. Has security and legal signed off on the survivors at selection, not deferred the review to launch?

If question 2 is "no," you are scoring before you sieve, and a gate-failure is waiting for you at the security review.


Where this appears in production

Gates that eliminate platforms before scoring: - Data residency for a regulated bank — conversation data and memory must rest in ap-south-1 with a single provable boundary, eliminating any cross-cloud runtime whose residency proof fragments. - Bedrock Guardrails PII redaction — masks PAN, account numbers, and other PII before it reaches the model, with the redaction logged, satisfying the PII gate. - Multi-tenant isolation on AgentCore — per-tenant memory scoping and pre/post-processing guardrails prevent customer A's data surfacing in customer B's, the isolation gate. - Your-key audit signing — owned orchestration invokes the model and signs each call with the bank's own KMS key, with logs dual-written and exportable, satisfying the audit gate. - AgentCore HIPAA eligibility — lets a healthcare agent clear the regulatory gate that a non-eligible platform fails outright.

Where the model-call ownership gate decides liability: - A closed support SaaS — the vendor invokes inference on the bank's behalf, foreclosing your-key signing and shifting the liability boundary to the vendor (file 01's wall as a gate failure). - An owned orchestration on a runtime — the bank invokes and signs every call, keeping liability and signability in-house. - A loan-denial decision — must be attributable to a specific agent version, signed, and reconstructable for the regulator; only an owned model call makes this provable.

Where running gates late wasted the build: - A team that scored first, sieved last — built two months on a cross-cloud runtime that failed the residency gate at the security review and slipped a quarter rebuilding. - A healthcare provider on a vertical agent — hit a BAA and residency gate the vendor's shared cloud couldn't satisfy, discovered after integration (file 01's healthcare wall). - A multi-tenant SaaS agent — leaked across tenants through shared memory the rubric had scored as a capability, found in a penetration test.


Pause and recall

  1. Name the five gates and the binary filter each applies.
  2. Why is folding security into the rubric as a weighted axis a category error?
  3. What does "compensatory vs non-compensatory" mean, and why must gates be non-compensatory?
  4. Why does the bank's residency gate fail a cross-cloud runtime that tops the rubric?
  5. Which property decides three of the five gates, and where have you seen it before in this module?
  6. Why does a closed SaaS structurally fail the model-call and audit gates?
  7. When does gate-first evaluation become wasted process?
  8. What signal tells you gates are being run as formalities instead of filters?

Interview Q&A

Q1. Why not just add "security & compliance" as a heavily-weighted axis on your platform rubric? A. Because a weighted axis is compensatory — a high score elsewhere can offset a low security score, which mathematically permits trading legality for capability. A hard requirement like data residency is legal-binary, not graded; no weight high enough to guarantee elimination exists inside a sum without distorting every other comparison. Hard requirements must be gates that run as a pass/fail sieve before the rubric; only soft preferences belong as weighted axes. Common wrong answer to avoid: "Weight it 5 so it dominates." A weight handicaps but doesn't eliminate; enough capability still buys back a violation.

Q2. A cross-cloud runtime topped your capability rubric at 140. Why might you eliminate it anyway? A. Because the rubric measures capability among the already-legal, and this platform fails a gate. Our data is on AWS; a runtime on another cloud means the residency proof fragments across two clouds' audit trails, and the regulator requires a single provable boundary. That's a gate failure — pass/fail, non-compensatory — so the 140 is irrelevant. A platform that's illegal or unprovable to a regulator doesn't compete on capability. Common wrong answer to avoid: "It scored highest, ship it." The score is meaningless if a hard gate fails; capability and legality are independent.

Q3. Which property decides whether a platform passes residency, audit, and model-call gates — and where did you see it earlier? A. Ownership of the data and the model call — the same boundary that file 01 framed as control, file 02 as gravity, and file 04 as portability. If your data rests in your cloud and your orchestration makes the inference call, residency is provable on a single boundary, audit logs are yours to sign with your key, and liability is yours. A closed SaaS that owns the model call fails all three structurally — it can't sign with your key or keep the call yours. The control boundary is the compliance boundary. Common wrong answer to avoid: "It depends on the platform's security features." Features matter less than who physically holds the data and invokes the model.

Q4. What's the right order — rubric then security review, or gates then rubric? Why? A. Gates first, always. Running the compensatory rubric before the non-compensatory sieve means a gate-failure — a knowable day-one fact like "our data is on AWS" — surfaces at the security review two months into the build, wasting the build and slipping the timeline. Defining the five gates from the regulator's hard requirements costs a day; running them last costs a quarter. Sieve to pass/fail first, then score only the survivors. Common wrong answer to avoid: "Score first, then get security sign-off near launch." That's exactly the ordering that wastes two months on a gate-failing platform.

Q5. A teammate proposes encryption plus a data-processing agreement to satisfy a regulator's residency rule. Is that enough? A. Only if the gate is soft. A contractual residency preference can sometimes be met with encryption and a DPA, but a regulator's hard residency rule is about where data physically rests and whether you can prove a single boundary — encrypted data in the wrong region is still data in the wrong region. So first sort the gate as hard or soft; for a hard legal boundary, mitigations don't apply and the platform must physically keep state in-region or it's eliminated. Common wrong answer to avoid: "Yes, encryption satisfies residency." Encryption protects data in place; it doesn't move where the data physically rests.

Q6. The bank's file-01 wall was "no your-key signing, no in-VPC calls." Is that a build-vs-buy problem, a lock-in problem, or a governance problem? A. It's all three, and underneath it's a governance-gate failure. File 01 framed it as a boundary (build-vs-buy), file 04 as exit cost (lock-in), but the reason the requirements were uncrossable is that the closed SaaS owned the model call and the data — so it structurally fails the model-call ownership and audit gates. The control boundary that blocked features is the same boundary that fails compliance. Run the governance sieve and the SaaS is eliminated before the rubric, for the same root cause that caused the wall. Common wrong answer to avoid: "It's purely a missing-feature problem." Your-key signing isn't a feature a closed SaaS adds; it's a gate that ownership of the model call decides.


Design/debug exercise (10 min)

Step 1 — Model it. Run the gate sieve on the bank's support agent for one candidate:

Candidate: AgentCore (data + identity on AWS, owned LangGraph orchestration)
Gate 1 residency:   in ap-south-1, single boundary (data also on AWS)   PASS ✓
Gate 2 PII:         Guardrails mask PAN/account pre-model, logged        PASS ✓
Gate 3 isolation:   per-customer memory scoping, no shared-state leak    PASS ✓
Gate 4 audit:       owned call signed with bank KMS, logs dual-written   PASS ✓
Gate 5 model call:  bank owns inference; vendor never calls for it       PASS ✓
→ Survives the sieve. Proceeds to the rubric (file 03).
Same gates on a cross-cloud runtime: Gate 1 FAILS (fragmented proof) → ELIMINATED.
Same gates on a closed SaaS: Gates 4,5 FAIL (vendor owns the call) → ELIMINATED.

Step 2 — Your turn. Run the gate sieve on the bank's internal-ops agent, where the data is more sensitive (KYC, compliance memos). Tighten each gate (residency may demand the sovereign region from file 04; isolation is per-RM and per-customer; audit must capture every document accessed) and decide which candidates survive. Then run the five gates on one agent from your own backlog and sort each of its requirements into gate-or-axis before scoring anything.

Step 3 — Reproduce from memory. Redraw the gate sieve (five pass/fail forks before the rubric) cold, then connect it to file 03: the rubric is compensatory (weigh and sum), the gates are non-compensatory (pass all or out), and the gates run first so the rubric only ever scores legal platforms. If you can name the five gates, explain why they can't live inside the weighted sum, and show which property decides three of them, you own this chapter.


Operational memory

This chapter added the dimension that overrides every quantitative pass: the procurement gate. The important idea is that hard security, residency, PII, isolation, and audit requirements are gates, not axes — pass/fail filters that run before the capability rubric and eliminate any platform that fails one, regardless of how it scored on capability, cost, or portability. Folding a gate into the weighted rubric is a category error, because a sum is compensatory (a high score offsets a low one) and a legal gate is non-compensatory (failing it is fatal). Gates live outside the sum, as a sieve that runs first.

You learned to sort every requirement into gate-or-axis, run the five-gate sieve before scoring, and recognize that three gates (residency, audit, model-call ownership) are decided by the same ownership boundary that earlier files framed as control, gravity, and portability. That solves the opening failure because the bank's security review would have eliminated a cross-cloud runtime on residency after two months of build — but running the sieve first eliminated it before anyone wrote code, leaving only legal candidates for the rubric to score.

Carry this diagnostic forward: run security and legal at selection, not at launch, and refuse to discuss capability for a platform that hasn't cleared the gates. When a requirement arrives, sort it first — is it legal-binary (a gate) or graded-preferential (an axis)? If it's a hard gate, no rubric score and no mitigation buys it back; the platform either physically meets it or it's out.

Remember:

  • Hard security/residency/PII/isolation/audit requirements are gates (pass/fail), not rubric axes (weighed).
  • Gates are non-compensatory: failing one is fatal regardless of every other score; they live outside the sum.
  • Run the five-gate sieve before the rubric, so you never build on a platform a gate will kill.
  • Ownership of the data and the model call decides residency, audit, and liability — the control boundary is the compliance boundary.
  • Sort each requirement as legal-binary (gate) or graded-preferential (axis); mitigations move soft gates, never hard ones.
  • Security signs off at selection, not at launch; a late review means gates are run as formalities.

Bridge. We now have the full evaluation: gates sieve out the illegal, the rubric scores capability among the legal, the cost curve projects the bill, and the escape hatch keeps leaving cheap. Run in that order, the bank's decision is defensible and survives procurement. But notice what we've assumed throughout — that the players, prices, capabilities, and even the gate-passing facts are stable. They are not. This market reshuffles every quarter: runtimes ship features, SaaS vendors change pricing, frameworks merge, and a gate a platform failed last quarter it may pass next. The final file steps back to the contested claims and fast-moving boundaries of this whole module — what to hold loosely, what to re-evaluate in six months, and how to keep a decision honest in a market that won't sit still. → 07-boundary-tradeoff-review.md