Why split documents at all?
A retrieval system does not search whole documents, it searches passages. Before any of that can happen, each document has to be cut into pieces small enough to embed and match individually. That cutting is chunking. Each chunk becomes one embedding (a numeric vector representing its meaning) and one searchable unit. A query then finds the chunks closest to it, not the documents.
This sounds like plumbing, and it is, but it is plumbing that decides the ceiling. A retrieval system can only return a passage that exists as a chunk. If the sentence you needed got split down the middle, or buried in a chunk that is mostly about something else, no amount of clever searching downstream will recover it cleanly.
How big should a chunk be?
There is no single right answer, only a tension. Chunk too small and a passage loses the surrounding context that made it meaningful: a sentence that made sense in its paragraph now reads as a fragment, and one idea ends up scattered across several thin chunks. Chunk too big and a precise match gets diluted by paragraphs of unrelated text, so the store returns a long passage where the useful line is buried and the model’s context window fills with filler.
The usable middle holds one coherent idea per chunk, often with a little overlap so nothing falls between the cuts. The honest approach is to treat chunk size as a setting you test against real questions, not a constant you set once and trust.