Inference
RAG
RAG retrieves relevant passages (often via embeddings) and conditions generation on them—reducing reliance on parametric memory alone.
Expanded definition
Retrieval-augmented generation (RAG) combines a retriever (vector search, keyword/BM25, or hybrid) with a generator (usually an LLM). Production systems add chunking, metadata filters, re-ranking, citations, access control at query time, and eval loops because retrieval quality—not model size—usually dominates answer quality.
Related terms
Explore adjacent ideas in the knowledge graph.
Related
Comparisons, tools, and models that connect to this idea.