What does BM25 actually measure?
BM25 (Best Matching 25) answers one question: given a search query, which documents match it best? It scores each document from two signals. The first is how often the query words appear in that document, so a page that mentions your term repeatedly scores higher than one that mentions it once. The second is how rare each word is across the whole collection, so a rare, specific word counts for far more than a common one that turns up everywhere. There is also a brake on the first signal: once a word has appeared many times, each further mention adds less, so a document cannot win by sheer repetition.
That is the whole idea. No model, no training, no vectors. It looks at the words the reader typed, counts them, weights them by rarity, and ranks. The result is cheap to compute, easy to read, and easy to debug, because you can always trace exactly why a document scored the way it did.
Why does it keep beating the fancy approach?
The fashionable alternative is dense retrieval, which turns both the query and the documents into embeddings (vectors of numbers) and compares them by meaning rather than by shared words. That helps when a reader and a document say the same thing in different words. It is genuinely better at synonyms.
But meaning matching has a cost, and it is not free of failure either. On a collection that is small, curated, and already well tagged, the documents tend to use the same words a reader would search for, and that is exactly the home ground where keyword scoring is strongest. I found this the hard way: on my own tagged corpus, plain keyword scoring like BM25 matched the dense embeddings, so I rolled the embeddings back and kept the simpler layer. The fancy method has to earn its place by beating the baseline, and here it did not.
So BM25 is worth knowing for two reasons. It is often the right answer on its own, for small, well-organised collections. And even when you do reach for embeddings, it is the honest baseline you measure them against, so you know whether the extra machinery is actually paying for itself.