Skip to content

00. Tool execution sandboxes — First-principles overview

Module 19_tool_integration_contracts defined the AI-side contract — what the model is told it can call. Module 01_model_gateway_provider_ops enforced the provider-side boundary. This module is the third boundary — between the AI's tool calls and the systems they touch.


A platform engineer at a Chennai data-engineering SaaS is paged at 04:00 IST on a Wednesday because production cloud storage has been wiped clean. The investigation arrives at an embarrassing answer: the company's new AI-powered "data assistant" includes a Python-execution tool the model uses to analyse customer data. A user — neither malicious nor unusual — asked the assistant to "clean up the temporary files in our workspace." The model wrote a rm -rf command targeting a path it inferred from the system prompt. The Python tool executed the command. The path resolved, due to a misconfigured base directory, to a production bucket mount. Five terabytes of customer data, gone in 12 seconds. The cause is not the model, not the prompt, not the user. The cause is that the Python tool ran with the credentials, filesystem mounts, and network access of the AI service itself — no sandbox, no allowlist, no resource limit, no audit trail. By 09:00 the team is restoring from backup and writing a postmortem that will lead to a sandbox redesign. The honest sentence the lead writes: nothing structural prevented this.

That structural prevention is the tool execution sandbox. It is the boundary between what the model can call and what those calls can do. Every chapter of this module is one surface of that boundary. The opening incident is what happens when the boundary is missing; the rest of the module is what is on either side of it once the boundary exists.


What a tool execution sandbox is, in one sentence

A tool execution sandbox is the production boundary between the AI's tool calls and the systems those calls touch, designed so that the blast radius of a wrong, malicious, or model-hallucinated call is bounded by enforcement, not by trust.

Read the sentence right to left.

  • Bounded by enforcement, not by trust — the sandbox does not rely on the model behaving correctly. The model is treated as an untrusted input to the tool layer.
  • Blast radius of a wrong call — the sandbox shapes what damage is possible, not whether wrong calls occur. Wrong calls are inevitable; bounded blast radius is the design goal.
  • Tool calls and the systems they touch — the boundary is between the agent's intent and the production systems (filesystems, databases, networks, credentials).
  • Production boundary — not a library abstraction inside the agent runtime. A real enforcement layer with its own SLOs, audits, and on-call.

If a team has shipped an agent that executes code, calls APIs, or manipulates data on behalf of users, and does not have a sandbox layer, the question is not whether an incident will happen but how large it will be.


The six sandbox surfaces

Every production sandbox has six load-bearing surfaces. Memorise them; the rest of the module is consequences.

Surface One-liner Pressure it answers
The isolation layer Run the tool in a process, container, or VM that cannot reach the host confinement: the tool should not be able to access systems beyond its allowance
The resource limit layer CPU, memory, time, I/O caps enforced by the runtime exhaustion: a tool that loops or allocates without bound starves the host
The policy layer Filesystem, network, syscall allowlists / denylists per tool least privilege: each tool can only do what its purpose requires
The credential layer Tool-specific credentials with scoped permissions, never the agent's secret leakage: a compromised tool should not yield the agent's keys
The approval layer Irreversible or high-blast-radius actions require human approval reversibility: some actions cannot be undone after the fact
The observability layer Per-call audit, syscalls observed, resource use measured accountability: every tool call leaves a trail

The module's twelve chapters explore each surface in turn, then synthesise. The final two files are the architect checklist and the honest admission.


What this module is not about

  • Designing the tool contracts. That is 19_tool_integration_contracts. This module is the enforcement layer the contracts assume exists.
  • Prompt injection defence in general. Covered in 03_ai_security_safety/01_prompt_injection_security. This module's job is to bound the damage when injection succeeds, not to prevent injection itself.
  • General application security. OWASP, network security, identity management — assumed as a baseline.
  • Model-side controls. Refusals, capability restrictions, system-prompt design — covered upstream. The sandbox treats the model's output as untrusted regardless of upstream controls.

The recurring vocabulary

These terms appear in every chapter.

Name Surface What it is
the isolation boundary Isolation the process, container, or VM that runs the tool
the syscall allowlist Policy the set of system calls the tool is permitted to make
the resource budget Resource the CPU, memory, time, and I/O caps enforced per call
the scoped credential Credential the tool-specific permission, never the agent's full access
the approval gate Approval the human-in-the-loop checkpoint for irreversible actions
the audit record Observability the per-call record: caller, tool, parameters, outcome, cost
the escape vector All a path by which a tool reaches beyond its sandbox
the blast radius All the worst-case scope of damage a tool can cause
the policy envelope Policy the union of filesystem, network, syscall, resource policies
the multi-tenant isolation All the guarantee that one tenant's tool calls cannot affect another

The journey: bound the boundary, then operate it

This module has two acts.

Act 1 — Build the sandbox (files 01–07). The case for the sandbox, the surfaces, isolation models, resource limits, filesystem/network policy, credentials, approval gates. By file 07 the sandbox exists as a defensible production layer.

Act 2 — Operate the sandbox (files 08–11). Output validation, escape vectors and defences, multi-tenant isolation, observability. The sandbox does not become more powerful; it becomes resilient to time, scale, and attack.

Synthesis (files 12–13). Architect checklist and honest admission.


Memory map

# File Surface Pressure answered What it adds
01 why-unsandboxed-execution-fails the cost of unbounded tool calls the case that forces the boundary
02 the-sandbox-surfaces All what the sandbox actually does the six surfaces as one architecture
03 isolation-models Isolation one isolation does not fit all process, container, VM, language isolation
04 resource-limits Resource tools can exhaust the host CPU, memory, time, I/O caps
05 filesystem-and-network-policy Policy filesystem and network are the largest blast radius allowlists, denylists, namespacing
06 secrets-and-credentials-in-tools Credential the tool's credentials are not the agent's scoped tokens, broker patterns, rotation
07 irreversible-actions-and-approvals Approval some actions cannot be undone after approval gates, dry-runs, idempotency
— milestone: sandbox is defensible —
08 output-validation-and-sanitisation All tool output flows back to the model sanitisation, injection prevention
09 escape-vectors-and-defenses All sandboxes are escaped known vectors, hardening, monitoring
10 multi-tenant-isolation All one tenant's calls must not affect another per-tenant boundaries, namespacing
11 observability-and-audit Observability every call must leave a trail per-call audit, syscall tracing, cost
— milestone: sandbox is operable —
12 architect-checklist Synthesis completeness 20-item design / build / launch / operate
13 honest-admission Boundaries humility what sandbox design cannot solve

Three traversal paths use this map. Prerequisite path — top to bottom. Failure path — when a security incident points to a tool, find which surface failed. Synthesis path — pick two surfaces and ask how they compose (Isolation + Credential = how tightly-scoped tokens limit a sandbox escape).


How this module relates to its neighbours


Top resources

  • gVisor — Application Kernel for Containers — https://gvisor.dev/
  • Firecracker — microVMs for serverless — https://firecracker-microvm.github.io/
  • Open Container Initiative (OCI) runtime spec — https://opencontainers.org/
  • AWS Lambda execution environment — https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
  • OWASP API Security Top 10 — https://owasp.org/API-Security/

These are the technical baselines. The AI-specific surface on top is the contribution of this module.


What's coming

  1. 01-why-unsandboxed-execution-fails.md — the cost of unbounded tool calls.
  2. 02-the-sandbox-surfaces.md — six surfaces as a service architecture.
  3. 03-isolation-models.md — process, container, VM, language.
  4. 04-resource-limits.md — CPU, memory, time, I/O.
  5. 05-filesystem-and-network-policy.md — allowlists, denylists, namespacing.
  6. 06-secrets-and-credentials-in-tools.md — scoped tokens, broker patterns.
  7. 07-irreversible-actions-and-approvals.md — approval gates, dry-runs, idempotency.
  8. 08-output-validation-and-sanitisation.md — tool output as untrusted input.
  9. 09-escape-vectors-and-defenses.md — known sandbox escapes.
  10. 10-multi-tenant-isolation.md — per-tenant boundaries.
  11. 11-observability-and-audit.md — per-call audit.
  12. 12-architect-checklist.md — twenty items.
  13. 13-honest-admission.md — limits.

Bridge. Before designing isolation, resource limits, or approval gates, we feel why the sandbox is needed at all. The opening incident showed what happens without a sandbox; the first chapter walks through the failure shapes the sandbox must absorb. → 01-why-unsandboxed-execution-fails.md