Safety

Prompt injection

Prompt injection is an attack or failure mode where untrusted text tries to override system instructions or steer a model into unsafe behavior.

Expanded definition

Prompt injection can appear in user messages, retrieved documents, web pages, emails, or tool outputs. Defenses include separating trusted instructions from untrusted content, limiting tool permissions, quoting retrieved text, using allowlists, adding policy checks, and monitoring for suspicious instructions. It is especially important in RAG and agent systems.

Related terms

Explore adjacent ideas in the knowledge graph.

system prompt guardrails RAG tool use

Comparisons, tools, and models that connect to this idea.

Azure Openai Vs Amazon Bedrock (comparisons)
Generative Model (glossary)
Claude 3 5 Sonnet (models)
Adversarial Training (glossary)
Generative Adversarial Network Gan (glossary)
Graph Machine Learning (glossary)