What is an RAG?
A technique that combines AI language models with external knowledge retrieval for more accurate answers.
Stands for: Retrieval Augmented Generation
Pronunciation: /ræɡ/
RAG explained in plain English
Retrieval Augmented Generation (RAG) is a method that improves AI responses by first searching a knowledge base for relevant information, then feeding that information to a language model along with the user's question. Instead of relying solely on what the model learned during training, RAG lets it "look things up" before answering.\n\nThis approach reduces hallucinations, keeps answers current, and allows organisations to use proprietary documents without retraining the entire model.
Analogy
RAG is like giving a student an open-book exam instead of a closed-book one. The student (language model) still needs to understand and synthesise information, but they can consult reference materials (retrieved documents) to give better answers.
Example
A company chatbot uses RAG to answer employee questions about HR policies. When asked "How many vacation days do I get?", the system searches the employee handbook, retrieves the relevant section, and the LLM generates a clear answer based on that specific document.
How is RAG used?
RAG is widely used in enterprise chatbots, customer support systems, internal knowledge bases, and any application where accurate, up-to-date information from specific documents is required.
Common misconceptions about RAG
RAG does not guarantee perfect accuracy — retrieval quality matters. Poor document chunking or irrelevant search results can still lead to incorrect answers.
History
RAG was introduced by Lewis et al. in 2020. It quickly became the standard approach for building production AI applications that need domain-specific knowledge.
Related terms
People also read
- LLM
A type of AI model trained on vast amounts of text to understand and generate human language.
- Embedding
A numerical representation of text, images, or other data that captures semantic meaning.
- Chain-of-Thought Prompting
Asking an AI to show its reasoning step by step before giving a final answer, which often improves accuracy on complex tasks.
- GPT
A family of large language models developed by OpenAI that generate human-like text.
- Inference
The phase when a trained model is actually used — taking new input and producing a prediction or response.
- Prompt
The input text or instruction given to an AI model to guide its response.
- Token
The basic unit of text that AI language models process, which may be a word, part of a word, or punctuation.
- AI slop
Output from a generative AI system that favors quantity over quality.
- average precision at k
A metric for summarizing a model's performance on a single prompt that generates ranked results, such as a numbered list of book recommendations.
- BERT
A model architecture for text representation.