Learn

RAG: letting a model read your own documents

RAG (retrieval-augmented generation) is a pattern where, instead of relying only on what a model learned in training, you first search your own documents for passages relevant to the question and paste them into the prompt. The model then answers from that retrieved context. It is how a general model answers questions about your private, current, or niche material without being retrained.

At a glance

What it is
Search your documents first, then let the model answer from what was found
Why use it
Answers about private or fast-changing material, without retraining
The weak link
Retrieval. A confident answer from the wrong passage is still wrong
Cheaper than
Fine-tuning, when the knowledge changes often
Flow

How a RAG query flows

The model never sees your whole library, only the passages retrieval picked for this one question.

1
Your question
2
Retrieve passages search the documents for the most relevant chunks
3
Stuff into the prompt paste those passages alongside the question
4
Model answers from them grounded in your material, not only its training

What is RAG?

A model only knows what it saw during training. Ask it about your own notes, last week’s incident, or an internal tool, and it will either admit ignorance or, worse, invent a confident answer. RAG (retrieval-augmented generation) closes that gap without retraining the model. When a question comes in, a search step finds the most relevant passages in your own documents, and those passages are pasted into the prompt next to the question. The model then answers from what was retrieved. The model stays general; the knowledge stays yours and current.

When does RAG help, and when not?

RAG earns its keep when the answer lives in material the model never trained on, especially material that changes often, where retraining would always lag. It also lets the model show its sources, which matters when a wrong answer is expensive.

It does not fix everything. The whole pattern leans on retrieval: if the search returns the wrong passages, the model answers confidently from the wrong context, and you have dressed up a mistake. It also cannot supply reasoning the model lacks. And if your documents are small, you may not need RAG at all; you can hand them to the model directly and skip the machinery.

RAG helps when

  • The answer lives in your own documents, not in general knowledge
  • The material changes often, so retraining could never keep up
  • You want the model to point at where an answer came from

RAG will not fix

  • Retrieval returns the wrong passages; bad context gives a confident wrong answer
  • The question needs reasoning the model simply cannot do, whatever the context
  • Your documents are small enough to paste into the prompt directly

Related terms

← All terms Reviewed: June 2026