Agentjacking
Expanded definition
Agentjacking is not a formal standard term, but it is useful shorthand for agent hijacking risk. It covers cases where untrusted content, compromised tools, poisoned instructions, or unsafe environment access causes an agent to ignore the user's goal and take actions chosen by an attacker. It is closely related to prompt injection, indirect prompt injection, tool misuse, computer-use security, and confused-deputy failures. Practical defenses include least-privilege tools, isolated sandboxes, allowlisted network access, confirmation for consequential actions, clear trust boundaries between instructions and data, and audit logs for tool calls.
Related terms
Explore adjacent ideas in the knowledge graph.
Related
Comparisons, tools, and models that connect to this idea.