What is an embedding?
An embedding is a list of numbers, a vector, that a model produces to stand in for a piece of text. The trick is that the model is trained so that texts with similar meaning produce vectors that sit close together. “How do I reset my password” and “I forgot my login” share almost no words, but their embeddings land near each other. That closeness is what makes meaning-based search possible: you embed the query, embed the documents, and compare the vectors by how near they are.
How are embeddings used?
The common use is retrieval. You embed a corpus once and store the vectors. At query time you embed the question and pull back the nearest vectors, which point to the most relevant chunks. This is the heart of retrieval-augmented generation, where those chunks get handed to the generating model as context. The embedding model is a separate, usually small model from the one writing the answer, and it often runs comfortably on local hardware.
When are embeddings not the answer?
Embeddings shine when the corpus is large and the wording varies. On a small, well-tagged corpus, plain keyword scoring can match or beat them, with no vector store to maintain and nothing to keep in sync. They also do not replace exact matching: if someone searches for a precise error code, you want the literal string, not the nearest neighbour. Reach for embeddings when meaning matters more than the exact words, and measure before you assume they help.