Learn

Token: the unit a model reads and writes

A token is the unit of text a language model reads and writes: a short chunk that is often a whole word, sometimes a piece of one, sometimes a single character or punctuation mark. The model never sees letters or words directly; it sees a sequence of tokens, and everything it produces comes out one token at a time.

At a glance

What it is
The chunk of text a model processes, roughly a word or word-piece
Rough rule of thumb
In English, a token is a little under one word on average
Why it matters
Context length, speed, and cost are all counted in tokens, not words
How output is made
The model emits one token at a time, each based on all the tokens so far
Flow

From your text to the model and back

Your prompt is split into tokens, the model reads them, then produces new tokens one by one. Those tokens are turned back into text you can read.

1
Your text raw words and characters you typed
2
Tokenizer splits it text becomes a list of tokens, each a small chunk
3
Model reads tokens, emits tokens one new token at a time, in order
4
Detokenized back to text the tokens become readable output

What is a token?

A language model does not read letters or words. Before it sees anything, your text is run through a tokenizer that splits it into tokens: short chunks that are often a whole common word, sometimes a piece of a longer or rarer word, and sometimes a single character or a punctuation mark. The model works on that sequence of tokens, and when it answers, it produces tokens, which are turned back into text for you to read.

A loose rule of thumb for English is that a token is a little under one word on average. Code, other languages, and unusual words break that rule, so never treat token counts and word counts as the same number.

Why count tokens instead of words?

Almost everything you care about when running a model is measured in tokens. The context length, the maximum amount of text a model can hold at once, is a token count. Speed is reported as tokens per second. The key-value (KV) cache, the working memory of a request, grows with each token in the context. And hosted models bill per million tokens.

So when a prompt feels expensive, slow, or too long to fit, the honest unit to think in is tokens. Word count is a polite approximation. Token count is what the machine actually pays for.

Counted in tokens

  • Context length: the most tokens a model can hold at once
  • Speed: usually reported as tokens per second
  • The key-value (KV) cache: it grows with every token in context
  • Cost on hosted models: priced per million tokens in and out

Not the same as

  • Words: a token is often shorter than a word, so the counts differ
  • Characters: one token can cover several characters or just one
  • Sentences: a single sentence can be many tokens
  • Bytes on disk: tokenization is not the same as file size

Related terms

← All terms Reviewed: June 2026