Embedding: text turned into a vector : Learn

An embedding is a fixed-length list of numbers (a vector) that a model produces to represent a piece of text, an image, or other input. Texts with similar meaning produce vectors that sit close together, so closeness in this number space stands in for similarity in meaning. Embeddings are what let a search compare query and documents by sense, not just by shared words.

What is an embedding?

An embedding is a list of numbers, a vector, that a model produces to stand in for a piece of text. The trick is that the model is trained so that texts with similar meaning produce vectors that sit close together. “How do I reset my password” and “I forgot my login” share almost no words, but their embeddings land near each other. That closeness is what makes meaning-based search possible: you embed the query, embed the documents, and compare the vectors by how near they are.

How are embeddings used?

The common use is retrieval. You embed a corpus once and store the vectors. At query time you embed the question and pull back the nearest vectors, which point to the most relevant chunks. This is the heart of retrieval-augmented generation, where those chunks get handed to the generating model as context. The embedding model is a separate, usually small model from the one writing the answer, and it often runs comfortably on local hardware.

When are embeddings not the answer?

Embeddings shine when the corpus is large and the wording varies. On a small, well-tagged corpus, plain keyword scoring can match or beat them, with no vector store to maintain and nothing to keep in sync. They also do not replace exact matching: if someone searches for a precise error code, you want the literal string, not the nearest neighbour. Reach for embeddings when meaning matters more than the exact words, and measure before you assume they help.

Embedding: text turned into a vector

At a glance

From text to a comparable vector

What is an embedding?

How are embeddings used?

When are embeddings not the answer?

Embeddings are good for

Embeddings are not

Related terms

At a glance

From text to a comparable vector

What is an embedding?

How are embeddings used?

When are embeddings not the answer?

Embeddings are good for

Embeddings are not

Related terms

Go deeper