Agent security

Agentjacking

Agentjacking is an informal term for hijacking an AI agent's tools, context, or execution path so it performs attacker-directed actions.

Expanded definition

Agentjacking is not a formal standard term, but it is useful shorthand for agent hijacking risk. It covers cases where untrusted content, compromised tools, poisoned instructions, or unsafe environment access causes an agent to ignore the user's goal and take actions chosen by an attacker. It is closely related to prompt injection, indirect prompt injection, tool misuse, computer-use security, and confused-deputy failures. Practical defenses include least-privilege tools, isolated sandboxes, allowlisted network access, confirmation for consequential actions, clear trust boundaries between instructions and data, and audit logs for tool calls.

Related terms

Explore adjacent ideas in the knowledge graph.

agent hijacking prompt injection indirect prompt injection computer use tool calling

Comparisons, tools, and models that connect to this idea.