Skip to content

Phase 2 — Bound the blast

Covers chapters 08–15. By the end, the Phase 1 agent is safe — every tool is mapped to a blast-radius class, irreversible writes carry idempotency keys, the OR-gate stopping rule fires before the loop runs you, the read-fanout-write-chain schedule is honest about dependencies, an approval gate sits in front of the ₹50,000 threshold, the cost-and-latency budget is named with numbers, and the agent serves two tenants with zero context bleed.


What you will add this phase

Seven layers, in order. Each one builds on the previous; each one is a place a real production agent has caught fire.

  1. Blast-radius classification of every tool, with safeguards stacked per class.
  2. Idempotency keys derived from intent for every state-changing tool.
  3. The OR-gate stopping rule across iterations, tokens, cost, time, no-progress, and repeated errors.
  4. Read-fanout-write-chain scheduling — parallel where independent, chained where dependent.
  5. A retrieval tool with required region filter and confidence scores.
  6. A scratchpad with five named state keys and a session memory layer.
  7. An approval gate above ₹50,000 with full spec (trigger, reviewer, packet, timeout, resume).
  8. The cost-and-latency budget with traffic-class routing.
  9. Per-tenant isolation across the four surfaces from chapter 15.

That looks like a lot. It is — Phase 2 is the densest of the four because production safety is dense. Plan two or three sessions on this phase rather than trying to land it in one.


Chapters to read first

The acceptance check leans most heavily on chapters 08, 13, and 15.


The build

Step 1 — Classify every tool by blast radius

Open chapter 08's four-class taxonomy. For each of your five tools, assign a class:

  • find_customer_by_email — Class 1, read-only.
  • list_orders — Class 1.
  • get_refund_policy — Class 1.
  • issue_refund — Class 4, irreversible write. Money leaves the account.
  • send_customer_email — Class 3, reversible-but-expensive. The send is itself reversible only via a correction email, which the customer has already read.

Add a column to your design-notes.md titled safeguards. For each tool, list the safeguards required by its class — idempotency key (Class 2 and up), rate cap (Class 3 and up), dry-run support (Class 3 and 4), approval gate above a threshold (Class 4 above policy), audit-log retention (all writes), kill switch coverage (Class 3 and 4 primary).

The matrix should look uneven — Class 1 rows nearly empty, Class 4 rows fully stacked. If it is uniformly populated or uniformly empty, the classification is theatre.

Step 2 — Derive idempotency keys from intent

Phase 1 left idempotency_key in the schema but did not populate it. Now you populate it. The key must encode what the agent is trying to do, not which retry attempt this is. Examples:

  • For Priya's refund on order 448100 due to delay: refund_448100_delay.
  • For Suresh's refund on order 882741 due to double-charge: refund_882741_duplicate.
  • For the customer email about Priya's refund: notify_refund_448100.

The model is responsible for picking the key in the tool call; the system prompt tells it the format. Add to the system prompt: "For every state-changing tool call, generate idempotency_key as <verb>_<primary_id>_<reason> using only lowercase alphanumeric and underscores."

Verify with a retry test: make issue_refund artificially time out on the first attempt and complete on the second; check that the second attempt carries the same key and the backend recognises the duplicate. Phase 2 should not double-issue a refund under network failure.

Step 3 — Wire the OR-gate stopping rule

Replace the simple iteration cap from Phase 1 with chapter 09's OR-gate. The orchestrator checks each condition at the top of every iteration; any one firing stops the loop. Concrete values:

STOP_RULES = {
    "max_iterations": 8,
    "max_input_tokens": 80_000,
    "max_output_tokens": 4_000,
    "max_wall_clock_s": 60,
    "max_cost_usd": 0.50,
    "max_consecutive_no_progress": 3,  # same observation hash twice
    "max_consecutive_tool_errors": 3,
}

Implement no-progress detection by hashing the most recent tool result and comparing across iterations. Implement consecutive-error detection by counting errors from the same tool until reset by a successful call.

Write a good give-up message when any rule fires. Not "I was unable to complete the task" — instead, "I tried X, Y, Z; I hit [stop_rule]; here is what I found so far; here is the specific next step that would unblock me." Phase 2 traces should show good give-up messages on the deliberate failure cases you construct.

Step 4 — Build a retrieve_refund_policy tool

Phase 1 had get_refund_policy(region, tier) returning a flat dict. That is fine when there is one policy document; in practice the policy KB has many documents and the agent needs ranked retrieval. Build a new tool retrieve_refund_policy_chunks(query, region_filter, time_range, top_k, return) following chapter 11b's typed retrieval schema:

  • region_filter is required, enum ["IN", "US", "EU", "APAC", "ANY"].
  • time_range is enum ["current", "last_quarter", "all"].
  • return defaults to chunks_with_metadata.
  • Response chunks carry doc_id, version, region, and confidence score per chunk; the response echoes the applied filter.

Seed the underlying policy KB with three documents: current Indian policy (21-day window), older Indian policy from 2024 (14-day window, no longer applies), and a US supplement (7-day window). The retrieval test: a query about "refund window for digital payouts" with region_filter="IN" should return only the Indian chunks; if you accidentally pass region_filter="ANY", you should see the US chunk surface, and that should worry you.

Decide whether the agent retrieves on every turn (always-retrieve) or only when it judges it needs grounding (decide-to-retrieve). Document the choice in design-notes.md and the failure mode you're accepting.

Step 5 — Schedule reads and writes correctly

Phase 1's loop ran tools serially because that was the default. Phase 2 makes the schedule deliberate. For the refund flow:

  • find_customer_by_email → blocking; nothing else can proceed.
  • After customer is known, fan out: list_orders and retrieve_refund_policy_chunks in parallel. They are independent reads.
  • Gather. Decide. Then chain the write: issue_refund followed by send_customer_email.

Implement the fan-out using the provider's parallel tool call support (Anthropic and OpenAI both expose this). The gather step inspects every branch — if list_orders succeeded but retrieve_refund_policy_chunks failed, decide whether to continue with degraded context or stop and ask. Document your join policy (all-or-nothing, best-effort, required-plus-optional) per call site.

Write a bad schedule deliberately as a comparison artifact: fan out issue_refund and send_customer_email in parallel. Capture the trace. Note in design-notes.md what went wrong even when both calls "succeed."

Step 6 — Build the scratchpad

Implement chapter 12's five-key scratchpad as a Python dataclass:

@dataclass
class Scratchpad:
    goal: str
    last_tool_result: dict | None
    open_questions: list[str]
    rejected_paths: list[str]
    next_action: str

After every tool call, the agent must update the scratchpad through a fixed mutation path — a record_observation() function that takes the tool name, result, and updated reasoning. The scratchpad gets serialised into the prompt on every turn as a fixed-template block.

Add a session memory layer: per-conversation Scratchpad survives across user turns. Persist to SQLite keyed by session_id. On the next turn, the agent loads the scratchpad and continues. This is what makes a multi-turn conversation actually multi-turn rather than five independent single-turns.

Step 7 — Wire the approval gate above ₹50,000

This is the chapter 13 gate. Specify it with all five fields:

HIGH_VALUE_REFUND_GATE = {
    "trigger": lambda call: call["tool"] == "issue_refund" and call["input"]["amount_inr"] > 50_000,
    "reviewer": "finance_lead",  # fallback: "finance_oncall"
    "packet": ["order_id", "amount_inr", "reason", "customer_tenure_months",
               "dispute_history", "policy_citation", "agent_rationale"],
    "timeout_s": 900,  # 15 minutes
    "resume": {
        "approve": "fire_tool_with_reviewed_amount",
        "edit": "re_enter_pipeline_at_top",  # critical
        "reject": "compose_decline_with_reason",
        "timeout": "escalate_finance_director",
    },
}

The agent intercepts the tool call before dispatch, checks the trigger, and if it fires, writes the packet to an "approval queue" file (runs/phase-2/approvals.jsonl) instead of calling the tool. The agent then composes an interim customer message ("your refund is under review, expected within 15 minutes") and ends the turn.

For the hands_on_lab, you play both roles — the agent and the reviewer. Open the queue, review Suresh's case (order=882741, amount=427000, reason=duplicate, with 12-Apr double-charge evidence), approve. Re-run the agent with the approval; it should fire issue_refund with the same idempotency key it would have used if the gate hadn't fired.

Test the edit path: reject Suresh's case and edit the amount to ₹4,20,000 (some fee retained). The agent's resume logic must re-enter the gate pipeline at the top — the new amount still exceeds ₹50,000, the gate fires again. If your resume logic fires the tool directly on edit, you've built the silent bypass that chapter 13's interview Q&A warned about.

Step 8 — Set the cost-and-latency budget

Open chapter 14. Pick a model for the orchestration loop and write down the budget in numbers:

Traffic class Share Model Max iters Token cap Time cap Cost target
Simple FAQ 60% small 1 3,000 in / 250 out 3s p95 ≤ ₹0.05
Standard refund 30% small + tools 4 8,000 in / 400 out 8s p95 ≤ ₹0.20
Multi-step (Suresh-class) 10% large + tools 8 25,000 in / 800 out 22s p95 ≤ ₹1.50

Compute the expected weighted cost. At 50,000 turns/day, your total should fit inside a chosen monthly ceiling (pick one — say ₹4 lakhs). Document the math.

For the hands_on_lab, implement a simple intent classifier (a tiny model or a heuristic) that routes Priya to the standard tier, Karthik to the standard tier (he gets denied, which is faster), and Suresh to the multi-step tier (he gets gated). If the routing is wrong, the cost overshoots predictably.

Wire the budget caps into the OR-gate from Step 3 — max_cost_usd is now per-traffic-class, not a single number.

Step 9 — Add Acme as a second tenant

Phase 1 served one tenant. Phase 2 introduces a second — Acme, on the enterprise plan, with different refund policy (no threshold; all refunds auto-process under a higher trust assumption). Apply chapter 15's four-surface isolation:

  • Prompt context. Tenant A's prior turns must never appear in tenant B's prompt. Per-session scratchpads, namespaced by tenant_id.
  • Memory store. The SQLite tables for session memory and the policy KB are namespaced by tenant_id. Cache keys for any memoised tool result are sha256(tenant_id + ":" + key_components), not sha256(key_components).
  • Tool credentials. If your mock backends had credentials, they would be per-tenant. Even with mocks, structure the code so credentials are fetched as vault.get(tenant_id, scope=...), not as a global constant. This is the discipline that lets a real production swap not be a rewrite.
  • Rate / cost limits. The OR-gate's per-tenant budget is per-tenant, not per-process. Acme on the enterprise plan gets a higher cost cap (say ₹5 lakhs/month) than NimbusPay's small business plan tenants would get.

Construct a deliberate cross-tenant cache-key test: cache a "summarise last 10 messages" tool result for tenant A; switch to tenant B with the same last_10_message_ids; verify the cache miss. If it hits, you have a tenant_id-less cache key somewhere.


Worked example

Here is the approval-gate spec for Suresh's ₹4,27,000 refund as it would appear in design-notes.md:

gate: high_value_refund
trigger:
  rule: amount_inr > 50000 AND tool == "issue_refund"
  fires_on: Suresh order_id=882741 amount=427000
reviewer:
  role: finance_lead
  fallback: finance_oncall
  staffing: M-F 09:00-18:00 IST; weekend on-call rotates
packet:
  - account_id: 88-2741
  - tenure_months: 51
  - balance_inr: 427000
  - dispute_history: ["12-Apr double-charge: two ₹2400 charges for same order"]
  - policy_citation: "4.2.1 - full refund permitted on confirmed double charges within 60d"
  - agent_rationale: "Customer requested full-balance refund; evidence supports duplicate charge"
timeout_s: 900
on_approve: |
  fire issue_refund(order_id=882741, amount_inr=427000,
                    reason="duplicate", idempotency_key="refund_882741_duplicate")
  then fire send_customer_email(...)
on_edit: |
  new_amount enters gate pipeline at top
  if new_amount > 50000: gate fires again, reviewer re-reviews
  if new_amount <= 50000: tool fires directly (subject to other safeguards)
on_reject: |
  agent composes decline with reviewer-provided reason
  decline message routes through post-output gate (Phase 4)
on_timeout: |
  escalate to finance_director
  customer receives second interim: "still under review, new expected window..."

That is the level of specification chapter 13 demands. If your gate spec is less concrete than this, the gate is theatre.


Acceptance check

Before Phase 3:

  1. Show me your safeguard matrix. Every tool has a class; Class 4 columns are full; Class 1 columns are nearly empty. If your matrix has every cell filled or every cell empty, the classification is wrong.
  2. Demonstrate that retrying a refund with the same idempotency key returns the original result, not a second refund. Run the test; capture the trace; commit it as runs/phase-2/idempotency-test.json.
  3. Trigger the OR-gate's no-progress detection. Construct a deliberately stuck case (the agent calls the same tool with the same arguments three times). Show that the loop stops at iteration 3 with a meaningful give-up message.
  4. Walk me through the approval gate for Suresh. You should be able to recite trigger, reviewer, packet contents, timeout, and the three resume paths without re-reading the chapter.
  5. Show me what happens when you try to edit Suresh's amount to ₹49,999. The gate should re-fire if the amount remains above ₹50,000; if you edited to ₹49,999, the tool should now fire directly but should still pass through other safeguards (idempotency key, audit log). Verify the audit-log entry has the original amount, the reviewed amount, and the reviewer identity.
  6. Try to retrieve the US policy chunk while running for an Indian tenant. The required region_filter should make this impossible at the schema level. If you can pass region_filter="US" for an Indian tenant, the isolation has a hole — find it.

If any answer is uncertain, the layer isn't really built. The acceptance check exists precisely because the failure modes Phase 2 prevents are the ones most likely to ship without it.


Common stumbles

Stumble 1 — random UUIDs as idempotency keys. Symptom: retries don't deduplicate, the backend treats every attempt as a new refund. Diagnosis: the key was generated per-call rather than per-intent. Fix: regenerate the key from order and reason; the same logical refund must produce the same key on every attempt.

Stumble 2 — soft caps in the system prompt instead of hard caps in code. Symptom: the model exceeds the iteration cap because the prompt said "you should not exceed 8 iterations" and the model interpreted that as advice. Fix: the cap belongs in the orchestrator's for loop, not in prose to the model.

Stumble 3 — fan-out parallelism on dependent calls. Symptom: the agent issues a refund and sends the email concurrently, the email goes out before the refund confirms. Fix: read the chapter-10 independence rule; if branch B depends on branch A's success, they chain, not fan out.

Stumble 4 — the approval-gate "edit" path that bypasses the gate. Symptom: reviewer edits Suresh's amount from ₹4,27,000 to ₹4,20,000; the tool fires directly because the edit was "already approved." But ₹4,20,000 is still above ₹50,000, and the gate should have re-fired. Fix: edit re-enters pipeline at top, full stop. The chapter-13 interview Q&A is explicit about this.

Stumble 5 — tenant_id as a filter after the lookup, not part of the key. Symptom: cache hits cross tenants because the cache key is sha256(message_hash) and the filter happens after the lookup. Fix: the tenant identifier is part of the key, not a filter — chapter-15 is non-negotiable on this.


Reflection prompts

  • Walk through how Suresh's request would have been handled in Phase 1 versus Phase 2. The Phase 1 trace fires the ₹4,27,000 refund directly; the Phase 2 trace stops at the gate, queues the packet, sends an interim message. Quantify the difference in blast radius if Suresh's case had been adversarial — what would Phase 1 have done that Phase 2 catches?
  • Your retrieval tool has a confidence floor. What floor did you set? On what evidence? Run a query you expect to return no relevant chunks and confirm the agent escalates rather than paraphrasing the highest-scoring-but-irrelevant chunk.
  • You serve two tenants now. What single bug in your scratchpad code, today, would cause Acme's customer data to surface in Initech's session? If you can name a candidate, fix it before Phase 3. If you can't, you haven't looked hard enough.
  • Your monthly budget envelope is ₹X. Where does the budget go if NimbusPay grows to a third tenant? Does the architecture scale, or do you need to revisit the routing tiers?

Continue to phase-3-survive-production.md.