Agentjacking Defense Checklist for AI Agents

Agentjacking is useful shorthand for an agent hijacking failure: the agent's tools, context, or execution path are steered away from the user's goal and toward an attacker-controlled action. The label is newer than the underlying risk, so document the concrete attack path in your system: indirect prompt injection, malicious tool output, unsafe browser content, overbroad credentials, or an unreviewed tool action.

1. Map the agent's blast radius

List every tool the agent can call, every credential it can use, every network destination it can reach, and every user-visible or system-visible state it can change. Mark tools as read-only, reversible write, irreversible write, payment, identity, code execution, or external communication.

2. Separate trusted instructions from untrusted content

Treat webpages, emails, PDFs, tickets, screenshots, retrieved passages, code comments, and tool outputs as data. They can contain instructions, but they should not become the agent's authority. Use clear boundaries, source labels, and retrieval metadata so the model can distinguish system instructions, developer instructions, user requests, and untrusted evidence.

3. Put policy at the tool boundary

Do not rely only on the model to decide whether a tool call is safe. Validate arguments, enforce identity-based permissions, block unexpected domains, rate-limit high-risk actions, and require approval for purchases, account changes, deletions, external messages, legal acceptance, or production writes.

4. Sandbox computer-use and browser agents

Computer-use agents need extra controls because screenshots and web pages become model input. Run them in a dedicated virtual machine or container, remove sensitive accounts, use minimal privileges, and restrict network access with an allowlist when possible.

5. Log enough to debug and audit

Store the user request, retrieved sources, tool-call arguments, tool results, approval decisions, and final answer. Redact secrets, but keep enough context to explain why the agent acted. Without tool logs, agentjacking investigations become guesswork.

6. Test with adversarial fixtures

Add regression tests containing malicious webpages, poisoned documents, hostile support tickets, and misleading tool outputs. A good test asks: did the agent preserve the user's goal, refuse or escalate unsafe instructions, and avoid unauthorized tool calls?

Sources

Anthropic computer use security considerations: https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool
OWASP LLM Top 10 prompt injection guidance: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Agentjacking Defense Checklist for AI Agents

Key insights

Use cases

Limitations & trade-offs

Agentjacking Defense Checklist for AI Agents

1. Map the agent's blast radius

2. Separate trusted instructions from untrusted content

3. Put policy at the tool boundary

4. Sandbox computer-use and browser agents

5. Log enough to debug and audit

6. Test with adversarial fixtures

Sources