02. The sandbox surfaces¶
~10 min read. The previous chapter showed why unsandboxed execution fails. This chapter is the prescription — the six surfaces of the sandbox as a single architecture, so each later chapter can develop one surface in detail.
Continues from
01-why-unsandboxed-execution-fails.md. Surfaces: isolation, resource, policy, credential, approval, observability.
The shape of the sandbox is the shape of the failures it absorbs. Six failure families; six defending surfaces; one architecture that composes them.
What the sandbox is, anatomically¶
A tool execution sandbox is the named composition of six surfaces — isolation, resource limits, policy, credentials, approvals, observability — designed so that a tool call's blast radius is bounded by enforcement at each surface.
Read right to left.
- Bounded by enforcement at each surface — every surface contributes a layer; depth is the discipline.
- Tool call's blast radius — the worst-case damage from one call. The sandbox's output is a bounded blast radius.
- Six surfaces — not one mechanism. The sandbox is composite.
If a team has fewer surfaces, the sandbox has a gap. The most common pattern is two: isolation (often only container-level) and credentials (often the agent's full set). Resource, policy, approval, and observability are usually under-invested.
The six surfaces, input / output / owner¶
Surface 1 — Isolation¶
Input. The tool's runtime requirements (language, system calls needed, network reach needed, filesystem access needed).
Output. The execution boundary — a process, container, language VM, or hardware VM that confines the tool.
Owner. The platform team owns the isolation infrastructure; the tool author owns the per-tool configuration.
The isolation layer is the sandbox's perimeter. Chapter 03 develops the choices.
Surface 2 — Resource limits¶
Input. The tool's expected resource profile.
Output. Enforced caps on CPU time, wall time, memory, file handles, network connections, I/O.
Owner. The platform team owns the enforcement mechanism; the tool author requests caps within platform policy.
Chapter 04 develops it.
Surface 3 — Policy¶
Input. The tool's filesystem, network, and syscall needs.
Output. Allowlists and denylists per tool, enforced by the runtime.
Owner. The platform team owns the policy mechanism; the tool author defines the per-tool policy.
Chapter 05 develops it.
Surface 4 — Credentials¶
Input. The tool's purpose and the scope of access required.
Output. A scoped credential (token, certificate, IAM role) specific to this tool, broker-issued at call time.
Owner. The platform team owns the broker; the tool author defines the scope requirement.
Chapter 06 develops it.
Surface 5 — Approval¶
Input. The tool's action classification — reversible, recoverable, irreversible.
Output. A human-in-the-loop gate for actions classified as irreversible.
Owner. The platform team owns the gate mechanism; the tool author classifies actions.
Chapter 07 develops it.
Surface 6 — Observability¶
Input. Tool call events: caller, parameters, syscalls, resource use, outcome.
Output. Per-call audit records and aggregate dashboards.
Owner. The platform team owns the observability infrastructure; the tool author ensures their tool's events flow.
Chapter 11 develops it.
The sandbox as a service¶
+----------------------+
| Agent runtime |
| (model + tool |
| dispatcher) |
+----------+-----------+
|
| tool call (tool name, params)
v
+----------+-----------+
| Approval layer (5) |
| irreversible? |
+----------+-----------+
| (after approval if needed)
v
+----------+-----------+
| Credential broker (4)|
| mint scoped token |
+----------+-----------+
|
v
+--------------------+-------------------+
| Isolation (1) + Resource (2) + |
| Policy (3) — combined enforcement |
| +-----------------+ |
| | the tool runs | |
| +-----------------+ |
+--------------------+-------------------+
|
| result, syscalls observed, resource use
v
+----------+-----------+
| Observability (6) |
| audit + dashboards |
+----------------------+
|
v
+----------+-----------+
| Result returned to |
| agent (with output |
| validation) |
+----------------------+
The path is sequential at the boundaries (approval, credentials) and concurrent during execution (isolation + resource + policy enforced together). Observability spans the entire flow.
A worked example — sandboxing the Python tool¶
The Chennai data-engineering SaaS rebuilds its Python execution tool after the data-deletion incident. Walking the six surfaces:
Isolation. The tool runs in a Firecracker microVM per call. The VM has no network namespace by default; the tool author opts in to specific network reach (e.g., the analytics database) via the policy layer.
Resource limits. CPU: 4 vCPU, 10 seconds. Wall time: 30 seconds. Memory: 1 GB. File handles: 64. Network: 2 outbound connections. The runtime enforces; exceeded caps terminate the call.
Policy. Filesystem: /workspace/<tenant-id>/<call-id> read-write; everything else read-only or denied. Network: only the analytics database endpoint allowed. Syscalls: a curated allowlist (no unlink, no umount, no kexec).
Credentials. A scoped token minted at call time: read access to the tenant's analytics database, write access to the per-call workspace directory. No IAM role, no broad cloud credentials, no instance metadata access.
Approval. Any tool action classified as irreversible (data deletion, schema change, export to external systems) requires a UI confirmation from the user. The data assistant's "clean up" action is classified as irreversible if it touches > 100 MB or > 100 files.
Observability. Every call records: tenant ID, user ID, tool name, parameters, syscalls observed, resource use, outcome, output hash, duration. The audit is queryable; the dashboard aggregates by tenant and tool.
Months later, the same user asks the assistant to clean up temp files. The model writes the shutil.rmtree call. The policy layer denies the filesystem path; the tool returns an error; the model re-asks for clarification. The incident does not happen.
Why all six surfaces are required¶
Each surface alone has a known failure mode:
- Isolation without resource limits. The tool is confined but can starve the host.
- Resource limits without isolation. The tool is constrained but can read the host's secrets and files.
- Policy without credentials. The tool is policy-restricted but its broad credentials authorise actions the policy was meant to block.
- Credentials without policy. The tool has scoped credentials but the policy layer is permissive; the tool acts at the boundary of its credentials.
- Approval without observability. The approval gate exists but the audit is absent; teams cannot review what was approved or denied.
- Observability without approval. The audit captures incidents after the fact; the irreversible action has already landed.
The sandbox's value is multiplicative across surfaces. Three of six is closer to half the value than three-quarters because the failure paths multiply.
Operational signals¶
Healthy. Every tool in production runs under a sandbox with all six surfaces. New tools require a sandbox configuration as part of the production-readiness review. Sandbox infrastructure is shared across tools to reduce per-tool cost.
First degrading metric. A new tool ships without one or more surfaces configured. The team's discipline is degrading.
Misleading metric. Number of tools. The metric to watch is surface coverage per tool, not tool count.
Expert graph. The matrix of tools × surfaces, with cell colour reflecting whether the surface is enforced for that tool, plus the aggregate coverage trend over time.
Boundary of applicability¶
Strong fit. Production AI agents calling tools that touch external resources. The full six-surface sandbox is justified.
Pathology. A read-only, internal-only tool with no destructive potential. The full sandbox is overkill; the basics (isolation, resource limits, observability) may be enough. The pathology is to skip surfaces for tools that later evolve to have destructive potential.
Scale limit. Very large platforms have hundreds of tools; the sandbox becomes a platform service. The pattern is shared infrastructure with per-tool policy.
Failure-prone assumption¶
The seductive wrong belief: the sandbox is the platform team's problem; the tool author just calls the runtime. Half-true — the platform owns the infrastructure, but the tool author owns the per-tool configuration (resource caps, policy envelope, credential scope, action classification). A tool author who treats the sandbox as opaque ships under-configured tools; the platform's defaults are not the same as the tool's actual needs.
The correct belief: the sandbox is a shared responsibility. The platform owns the mechanism; the tool author owns the configuration.
Where this appears in production¶
- A devops AI uses a microVM-per-call sandbox; each call is isolated; resource caps prevent runaway.
- A coding assistant uses gVisor for isolation; the kernel surface visible to user code is reduced.
- A data-engineering SaaS rebuilt its sandbox to all six surfaces after a data-deletion incident; subsequent incidents are bounded.
- A retail AI has shared sandbox infrastructure; per-tool config is a YAML manifest in the tool's repo.
- A fintech has the credential broker as a separate service; tokens are minted per call with scoped permissions.
- A healthcare AI has the approval layer integrated with the clinician's UI; irreversible actions confirm.
- A telecom AI has observability streamed to a SIEM; sandbox events are correlated with security signals.
- A consumer chatbot ships tools with isolation only; later incidents drive resource and policy investment.
- A government AI has the sandbox as a compliance artefact; auditors verify the matrix.
- A B2B SaaS has the production-readiness review check the sandbox matrix; tools without configuration do not ship.
- A travel AI has the policy layer as a per-tool YAML; tool authors write it; platform reviews it.
- A legal AI has the credential broker integrated with the firm's secrets management; scope is enforced centrally.
- A logistics AI has the resource layer enforced at the container level (cgroups + namespaces).
- A media AI has the observability layer feeding the on-call apparatus; sandbox alerts fire as paging conditions.
- A staffing AI has the approval layer for mass-message tools; sent messages require confirmation.
- A search-ops AI treats the sandbox as a launch gate; new tools have configured matrices.
- A document AI has separate sandboxes for trusted (internal) and untrusted (user-supplied code) tools.
- A real-estate AI has shared sandbox infrastructure but per-tool policy; common surface, custom configuration.
- An ad-tech AI has the sandbox matrix on the leadership dashboard; coverage is visible.
- A small SaaS has only isolation and credentials; the other four are deferred; the next incident exposes the gap.
Recall / checkpoint¶
- Name the six sandbox surfaces.
- For each, name the input, the output, and the typical owner.
- Why is the sandbox a shared responsibility between platform and tool author?
- What is the typical pattern when teams ship partial sandboxes?
- How does the sandbox compose at execution time (sequential vs. concurrent)?
- Why is "the platform owns the sandbox" half-true?
- What metric distinguishes sandbox-shaped infrastructure from actual sandbox coverage?
Interview Q&A¶
Q1. A team has isolation (containers) and credentials (scoped tokens) for their tools. Walk through the gaps. Four surfaces missing: resource, policy, approval, observability. The known failure modes: a tool exhausting host resources (no caps), a tool reaching unintended filesystem or network resources (no policy), an irreversible action executing without confirmation (no approval), incidents being invisible after the fact (no audit). The remediation is to add the four surfaces; each maps to a known failure family from chapter 01. Common wrong answer to avoid: "isolation is the main thing" — multiplicative value; each surface defends a different family.
Q2. Walk through the responsibility split between platform team and tool author. Platform owns mechanism: the runtime (Firecracker, gVisor, containers), the credential broker, the approval gate UI, the observability infrastructure, the policy enforcement engine. Tool author owns configuration: the resource caps for this tool, the policy envelope (filesystem, network, syscall lists), the credential scope, the action classification (which actions are irreversible). The split is necessary — platform cannot know each tool's needs; tool author cannot rebuild infrastructure per tool. The contract is the per-tool manifest. Common wrong answer to avoid: "platform owns everything" — produces overly-permissive defaults.
Q3. Walk through the worked example (the rebuilt Python tool). What surface caught the destructive call?
The policy layer. The filesystem policy denied write/delete on paths outside /workspace/<tenant-id>/<call-id>. The model's shutil.rmtree on the original mount returned a policy-denial error. The credential layer would also have failed (the scoped token did not authorise delete on the customer-storage bucket); the isolation layer would have failed (the mount was not present in the microVM). Defence-in-depth: multiple surfaces would have caught the same incident. Common wrong answer to avoid: "any one surface is enough" — depth is the discipline because each surface has its own failure modes.
Q4. The team uses containers for isolation. Is that sufficient? Depends on the threat model. For trusted internal tools with no destructive potential, possibly. For tools that touch production data, money, customer resources, or execute untrusted code, container isolation alone is insufficient — containers share the kernel with the host, and a kernel vulnerability or misconfiguration is an escape vector. The pattern for high-stakes tools is microVMs (Firecracker) or gVisor for kernel surface reduction. The choice is a tradeoff; chapter 03 develops it. Common wrong answer to avoid: "containers are isolation" — they are one layer; the depth needed depends on the tool's blast radius.
Q5. A team's tool authors complain that sandbox configuration is too much work. How do you respond? Empathise structurally: per-tool configuration is real work. Reduce it by providing well-chosen platform defaults — sensible resource caps, conservative policy envelopes, narrow credential scopes — that tool authors opt out of when needed rather than in to. Provide tooling: a generator that produces a starter manifest from a tool's declared inputs and outputs. Treat the configuration as part of the tool's API: reviewed in code review like any other API. The work is bounded; the cost of skipping is unbounded. Common wrong answer to avoid: "the platform should just figure it out" — platform cannot know each tool's needs.
Q6. How does the sandbox interact with prompt injection defences? They are complementary, not substitutable. Prompt injection defences (in the model layer) reduce the rate of malicious or adversarial tool calls. The sandbox bounds the damage of each tool call regardless of cause — model error, hallucination, adversarial prompt, or genuine user mistake. A team that relies on injection defences alone is one bypass away from the failure; a team with the sandbox is bounded by the sandbox even if injection succeeds. Both belong in the design. Common wrong answer to avoid: "if we block injection, we don't need the sandbox" — injection defences are not perfect; the sandbox is the structural defence.
Design / debug exercise (10 minutes)¶
Modelled example. Walk through the worked example (the Chennai Python tool). Verify the six surfaces are populated with input, output, owner, and at least one named artefact (resource cap, policy line, credential scope).
Your turn. Pick one tool. For each surface, fill in: what exists today, what exists in partial form, what is absent. Estimate the gap relative to the tool's blast radius potential.
Reproduce from memory. Draw the sandbox-as-a-service diagram. The signal of internalisation is that the surfaces, owners, and sequential/concurrent enforcement land in under three minutes.
Operational memory¶
This chapter explained the sandbox as a six-surface architecture — isolation, resource, policy, credentials, approval, observability — designed so each surface defends a specific failure family. The important idea is that sandbox value is multiplicative across surfaces; missing one degrades the whole architecture.
You learned to name each surface, its input, its output, its owner, and the failure mode of skipping it. That solves the opening failure because the rest of the module develops each surface in turn.
Carry this diagnostic forward: when someone says "we have a sandbox," ask which of the six surfaces they have. The honest answer is usually fewer than they thought.
Remember:
- Six surfaces: isolation, resource, policy, credentials, approval, observability.
- Each surface has an input, an output, and an owner.
- Sandbox value is multiplicative across surfaces.
- The sandbox is a shared responsibility: platform owns mechanism, tool author owns configuration.
- Defence-in-depth: multiple surfaces defend the same failure shape from different angles.
Bridge. The architecture is set. Isolation is the perimeter; the next chapter is the discipline of choosing the right isolation model — process, container, VM, language — for each tool. → 03-isolation-models.md