GenAIWiki

Inference

RAG

RAG retrieves relevant passages (often via embeddings) and conditions generation on them—reducing reliance on parametric memory alone.

Expanded definition

Retrieval-augmented generation (RAG) combines a retriever (vector search, keyword/BM25, or hybrid) with a generator (usually an LLM). Production systems add chunking, metadata filters, re-ranking, citations, access control at query time, and eval loops because retrieval quality—not model size—usually dominates answer quality.

Related terms

Explore adjacent ideas in the knowledge graph.

Related

Comparisons, tools, and models that connect to this idea.