A token is the unit of text a language model reads and writes: a short chunk that is often a whole word, sometimes a piece of one, sometimes a single character or punctuation mark. The model never sees letters or words directly; it sees a sequence of tokens, and everything it produces comes out one token at a time.
At a glance
What it is
The chunk of text a model processes, roughly a word or word-piece
Rough rule of thumb
In English, a token is a little under one word on average
Why it matters
Context length, speed, and cost are all counted in tokens, not words
How output is made
The model emits one token at a time, each based on all the tokens so far
Flow
From your text to the model and back
Your prompt is split into tokens, the model reads them, then produces new tokens one by one. Those tokens are turned back into text you can read.
1
Your textraw words and characters you typed
2
Tokenizer splits ittext becomes a list of tokens, each a small chunk
3
Model reads tokens, emits tokensone new token at a time, in order
4
Detokenized back to textthe tokens become readable output
What is a token?
A language model does not read letters or words. Before it sees anything, your
text is run through a tokenizer that splits it into tokens: short chunks that
are often a whole common word, sometimes a piece of a longer or rarer word, and
sometimes a single character or a punctuation mark. The model works on that
sequence of tokens, and when it answers, it produces tokens, which are turned
back into text for you to read.
A loose rule of thumb for English is that a token is a little under one word on
average. Code, other languages, and unusual words break that rule, so never
treat token counts and word counts as the same number.
Why count tokens instead of words?
Almost everything you care about when running a model is measured in tokens. The
context length, the maximum amount of text a model can hold at once, is a token
count. Speed is reported as tokens per second. The key-value (KV) cache, the
working memory of a request, grows with each token in the context. And hosted
models bill per million tokens.
So when a prompt feels expensive, slow, or too long to fit, the honest unit to
think in is tokens. Word count is a polite approximation. Token count is what
the machine actually pays for.
Counted in tokens
Context length: the most tokens a model can hold at once
Speed: usually reported as tokens per second
The key-value (KV) cache: it grows with every token in context
Cost on hosted models: priced per million tokens in and out
Not the same as
Words: a token is often shorter than a word, so the counts differ
Characters: one token can cover several characters or just one
Sentences: a single sentence can be many tokens
Bytes on disk: tokenization is not the same as file size