Inference

RAG

RAG retrieves relevant passages (often via embeddings) and conditions generation on them—reducing reliance on parametric memory alone.

Expanded definition

Retrieval-augmented generation (RAG) combines a retriever (vector search, keyword/BM25, or hybrid) with a generator (usually an LLM). Production systems add chunking, metadata filters, re-ranking, citations, access control at query time, and eval loops because retrieval quality—not model size—usually dominates answer quality.

Related terms

Explore adjacent ideas in the knowledge graph.

retrieval augmented generation embedding vector store

Comparisons, tools, and models that connect to this idea.

Azure Openai Vs Amazon Bedrock (comparisons)
Generative Model (glossary)
Claude 3 5 Sonnet (models)
Adversarial Training (glossary)
Generative Adversarial Network Gan (glossary)
Graph Machine Learning (glossary)