06. Tool abuse and action boundaries — text becomes impact¶
~12 min read. The most dangerous AI security failures happen when persuasive text turns into an API call, payment, email, file write, or permission change.
Continues from 05-data-exfiltration-and-secrets.md. The sealed envelope protected data. Now the guard rails must protect actions.
The previous chapter separated reading from revealing: the model may see some context without earning permission to disclose it. But many AI products do not only speak; they call tools, write tickets, send emails, refund money, or change state. This chapter moves from confidentiality to authority over action.
1) The wall — model output is not an authorization decision¶
An agent proposes a tool call:
The model may have good intent. It may also be confused, manipulated, stale, or operating on wrong context. The application must not treat model-generated arguments as trusted.
The security boundary is:
model suggests action
-> application validates shape
-> server checks authorization
-> policy checks risk
-> approval if needed
-> tool executes
Skipping those steps turns language into authority.
2) Tool risk levels¶
Not all tools deserve the same controls.
| Tool type | Example | Boundary |
|---|---|---|
| Read-only public | search docs | source trust and rate limit |
| Read-only private | fetch account | tenant/user authorization |
| Draft-only | draft email | human review before send |
| Reversible write | create ticket | schema + scope + audit |
| Irreversible action | refund, delete, transfer | approval + limit + step-up auth |
| Admin action | change permissions | usually no autonomous access |
The vault map should mark tool risk before the agent ships.
3) Worked example — refund tool¶
Weak design:
Stronger design:
user asks for refund
-> model asks eligibility tool
-> server computes eligibility from policy and account state
-> model drafts explanation
-> refund execution requires explicit approved workflow
-> amount and account are server-side validated
The model can explain. The server decides.
4) Why not let the model choose any tool argument¶
The tempting alternative is to expose flexible tools because they make the agent capable. Flexible tools are useful in prototypes and dangerous in production.
Tool arguments are attack surfaces. If the model can choose arbitrary IDs, filters, queries, recipients, file paths, or amounts, an attacker can try to steer those values through language.
Use narrow tools:
- typed schemas
- enum values instead of free text
- server-derived IDs
- tenant-scoped credentials
- allowlisted destinations
- idempotency keys
- dry-run mode
- approval gates for high-risk actions
Capability should be granted through product design, not through an all-powerful tool.
5) Production signals — action boundary health¶
The first metric is rejected tool-call rate by reason: schema, auth, policy, approval, risk limit, idempotency.
The misleading metric is tool success rate. A high success rate can mean the tool is under-validated.
The expert artifact is a tool decision trace:
model plan -> proposed args -> schema validation -> auth check -> policy check -> execution / rejection
6) Boundary — not every tool needs a human¶
Human approval is expensive. Use it for irreversible, high-value, high-risk, or ambiguous actions. Low-risk read and draft tools can be automated with proper scopes and logging.
The pathology is two extremes: all tools autonomous, or all tools manually approved. Mature design assigns friction by risk.
Recall checkpoint¶
- Why is model output not authorization?
- How do tool risk levels change controls?
- Why are narrow tools safer than flexible tools?
- What belongs in a tool decision trace?
Interview Q&A¶
Q: How do you secure tool-calling agents? A: Use least privilege, typed schemas, server-side authorization, tenant-scoped credentials, approval gates, idempotency, audit traces, and risk-based tool tiers.
Common wrong answer to avoid: "Trust the model to call tools correctly." Tool execution must be validated outside the model.
Q: What is dangerous about flexible tools? A: They let model-generated text control IDs, filters, destinations, amounts, or file paths, which creates injection and abuse paths.
Common wrong answer to avoid: "Flexible tools make the agent smarter." They also make the blast radius larger.
Q: When should a human approve an agent action? A: For irreversible, high-value, regulated, ambiguous, or broad-impact actions.
Common wrong answer to avoid: "Human review for everything." That destroys usability and causes alert fatigue.
Apply now (10 min)¶
Model the exercise. Classify five tools in a support agent by risk level and control.
Your turn. Rewrite one broad tool into two narrower typed tools.
Reproduce from memory. Explain why the server decides and the model suggests.
What you should remember¶
This chapter explained tool abuse and action boundaries. The important idea is that model-generated tool calls are proposals, not permissions.
Carry this diagnostic forward: every tool needs an independent validation and authorization path.
Remember:
- Text becomes impact at the tool boundary.
- The model suggests; the server decides.
- Flexible tools enlarge blast radius.
- Human approval should be risk-based.
Bridge. Tools are one action surface. Memory creates another: information can persist, cross sessions, and later influence behavior. → 07-memory-and-cross-tenant-risk.md