Agent security

Indirect prompt injection

Indirect prompt injection happens when untrusted external content, such as a webpage, email, document, or tool result, contains instructions that try to steer an AI system.

Expanded definition

Indirect prompt injection is a prompt-injection pattern where the malicious instruction is not typed directly by the user. Instead, the model encounters it inside retrieved documents, webpages, screenshots, emails, support tickets, code comments, or tool outputs. Agent and RAG systems are especially exposed because they routinely ingest external content and may have tools that act on the model's interpretation. Defenses include labeling untrusted content, restricting tools by context, validating tool arguments, using allowlists, isolating browser or computer-use environments, and requiring approval for consequential actions.

Related terms

Explore adjacent ideas in the knowledge graph.

prompt injection agentjacking RAG tool calling computer use

Comparisons, tools, and models that connect to this idea.