RAG (retrieval-augmented generation) is a pattern where, instead of relying only on what a model learned in training, you first search your own documents for passages relevant to the question and paste them into the prompt. The model then answers from that retrieved context. It is how a general model answers questions about your private, current, or niche material without being retrained.
At a glance
What it is
Search your documents first, then let the model answer from what was found
Why use it
Answers about private or fast-changing material, without retraining
The weak link
Retrieval. A confident answer from the wrong passage is still wrong
Cheaper than
Fine-tuning, when the knowledge changes often
Flow
How a RAG query flows
The model never sees your whole library, only the passages retrieval picked for this one question.
1
Your question
2
Retrieve passagessearch the documents for the most relevant chunks
3
Stuff into the promptpaste those passages alongside the question
4
Model answers from themgrounded in your material, not only its training
What is RAG?
A model only knows what it saw during training. Ask it about your own notes,
last week’s incident, or an internal tool, and it will either admit ignorance or,
worse, invent a confident answer. RAG (retrieval-augmented generation) closes
that gap without retraining the model. When a question comes in, a search step
finds the most relevant passages in your own documents, and those passages are
pasted into the prompt next to the question. The model then answers from what was
retrieved. The model stays general; the knowledge stays yours and current.
When does RAG help, and when not?
RAG earns its keep when the answer lives in material the model never trained on,
especially material that changes often, where retraining would always lag. It
also lets the model show its sources, which matters when a wrong answer is
expensive.
It does not fix everything. The whole pattern leans on retrieval: if the search
returns the wrong passages, the model answers confidently from the wrong context,
and you have dressed up a mistake. It also cannot supply reasoning the model
lacks. And if your documents are small, you may not need RAG at all; you can hand
them to the model directly and skip the machinery.
RAG helps when
The answer lives in your own documents, not in general knowledge
The material changes often, so retraining could never keep up
You want the model to point at where an answer came from
RAG will not fix
Retrieval returns the wrong passages; bad context gives a confident wrong answer
The question needs reasoning the model simply cannot do, whatever the context
Your documents are small enough to paste into the prompt directly