Learn

Chunking: splitting documents before you embed them

Chunking is the step that splits a document into smaller passages before each passage is turned into an embedding and stored for retrieval. The size and boundaries of those chunks decide what a retrieval system can later find, so a quiet preprocessing choice ends up shaping every answer.

At a glance

What it is
Splitting documents into passages before embedding them
Why it matters
Chunk boundaries decide what retrieval can later find
The tension
Too small loses context, too big dilutes the match
Where it sits
The first step of a retrieval pipeline, before embedding
Flow

From document to retrievable passages

A document is split into passages, each becomes an embedding, and those go into the store. Green marks the passages a query can later retrieve.

1
Whole document too large to embed or match as one piece
2
Split into chunks passages sized to hold one coherent idea
3
Each chunk embedded and stored now individually retrievable by meaning

Why split documents at all?

A retrieval system does not search whole documents, it searches passages. Before any of that can happen, each document has to be cut into pieces small enough to embed and match individually. That cutting is chunking. Each chunk becomes one embedding (a numeric vector representing its meaning) and one searchable unit. A query then finds the chunks closest to it, not the documents.

This sounds like plumbing, and it is, but it is plumbing that decides the ceiling. A retrieval system can only return a passage that exists as a chunk. If the sentence you needed got split down the middle, or buried in a chunk that is mostly about something else, no amount of clever searching downstream will recover it cleanly.

How big should a chunk be?

There is no single right answer, only a tension. Chunk too small and a passage loses the surrounding context that made it meaningful: a sentence that made sense in its paragraph now reads as a fragment, and one idea ends up scattered across several thin chunks. Chunk too big and a precise match gets diluted by paragraphs of unrelated text, so the store returns a long passage where the useful line is buried and the model’s context window fills with filler.

The usable middle holds one coherent idea per chunk, often with a little overlap so nothing falls between the cuts. The honest approach is to treat chunk size as a setting you test against real questions, not a constant you set once and trust.

Chunks too small

  • Lose the surrounding context that gave a sentence meaning
  • Split one idea across several passages that each look thin
  • Match on fragments that read out of context

Chunks too big

  • Dilute a precise match with paragraphs of unrelated text
  • Return long passages where the useful line is buried
  • Waste the model's context window on filler around the answer

Related terms

← All terms Reviewed: June 2026