Inference
RAG
Retrieval-augmented generation augments LLM prompts with retrieved documents or structured context so answers can stay grounded in your data instead of relying only on model weights.
Expanded definition
Engineering RAG usually means chunking sources, embedding them into a vector index (often with metadata for tenancy), retrieving top-k passages per query, optionally re-ranking, and injecting results into the prompt with citations. Teams should measure faithfulness, handle empty retrieval, enforce access control at query time, and version embeddings when models change. RAG is the default pattern for enterprise assistants, support bots, and internal knowledge search.
Related terms
Explore adjacent ideas in the knowledge graph.