Agent security
Indirect prompt injection
Indirect prompt injection happens when untrusted external content, such as a webpage, email, document, or tool result, contains instructions that try to steer an AI system.
Expanded definition
Indirect prompt injection is a prompt-injection pattern where the malicious instruction is not typed directly by the user. Instead, the model encounters it inside retrieved documents, webpages, screenshots, emails, support tickets, code comments, or tool outputs. Agent and RAG systems are especially exposed because they routinely ingest external content and may have tools that act on the model's interpretation. Defenses include labeling untrusted content, restricting tools by context, validating tool arguments, using allowlists, isolating browser or computer-use environments, and requiring approval for consequential actions.
Related terms
Explore adjacent ideas in the knowledge graph.
Related
Comparisons, tools, and models that connect to this idea.