GenAIWiki
intermediate

Agentjacking Defense Checklist for AI Agents

A practical checklist for defending AI agents against agentjacking, indirect prompt injection, unsafe tool calls, and computer-use hijacking.
agent securityagentjackingprompt injectioncomputer usetool calling

11 min read

Updated todayVerified recentlyInformation score 92

Key insights

Concrete technical or product signals.

  • Agentjacking is best prevented at tool, permission, and environment boundaries.
  • Untrusted content should be treated as evidence, not authority.
  • Computer-use agents need sandboxing because screenshots and webpages are prompt surfaces.

Use cases

Where this shines in production.

  • Preparing a production-readiness checklist for browser or coding agents.
  • Designing approval gates for support, finance, or operations agents.
  • Creating security regression tests for RAG and agent workflows.

Limitations & trade-offs

What to watch for.

  • The term agentjacking is informal; risk reports should name the specific failure mode.
  • No checklist replaces threat modeling for the actual tools and credentials in use.

Agentjacking Defense Checklist for AI Agents

Agentjacking is useful shorthand for an agent hijacking failure: the agent's tools, context, or execution path are steered away from the user's goal and toward an attacker-controlled action. The label is newer than the underlying risk, so document the concrete attack path in your system: indirect prompt injection, malicious tool output, unsafe browser content, overbroad credentials, or an unreviewed tool action.

1. Map the agent's blast radius

List every tool the agent can call, every credential it can use, every network destination it can reach, and every user-visible or system-visible state it can change. Mark tools as read-only, reversible write, irreversible write, payment, identity, code execution, or external communication.

2. Separate trusted instructions from untrusted content

Treat webpages, emails, PDFs, tickets, screenshots, retrieved passages, code comments, and tool outputs as data. They can contain instructions, but they should not become the agent's authority. Use clear boundaries, source labels, and retrieval metadata so the model can distinguish system instructions, developer instructions, user requests, and untrusted evidence.

3. Put policy at the tool boundary

Do not rely only on the model to decide whether a tool call is safe. Validate arguments, enforce identity-based permissions, block unexpected domains, rate-limit high-risk actions, and require approval for purchases, account changes, deletions, external messages, legal acceptance, or production writes.

4. Sandbox computer-use and browser agents

Computer-use agents need extra controls because screenshots and web pages become model input. Run them in a dedicated virtual machine or container, remove sensitive accounts, use minimal privileges, and restrict network access with an allowlist when possible.

5. Log enough to debug and audit

Store the user request, retrieved sources, tool-call arguments, tool results, approval decisions, and final answer. Redact secrets, but keep enough context to explain why the agent acted. Without tool logs, agentjacking investigations become guesswork.

6. Test with adversarial fixtures

Add regression tests containing malicious webpages, poisoned documents, hostile support tickets, and misleading tool outputs. A good test asks: did the agent preserve the user's goal, refuse or escalate unsafe instructions, and avoid unauthorized tool calls?

Sources