Learn

Top-p: the probability cutoff for sampling

Top-p (nucleus sampling) is a cutoff applied while choosing the next token: the model sorts candidate tokens by probability, keeps the smallest group whose probabilities sum to p, and samples from only that group. A low p keeps the choice tight and safe; a high p lets in the long tail of unlikely tokens.

At a glance

What it is
A cutoff that keeps the top tokens summing to probability p
What it controls
How much of the unlikely long tail can be chosen
Pairs with
Temperature, which reshapes the probabilities before the cutoff
When it is off
Set p to 1.0 and the cutoff keeps every token
Stack

How top-p trims the candidate list

Tokens are sorted by probability, highest first. Top-p keeps the smallest group that adds up to p (the green nucleus) and discards the rest before drawing one.

3
Long tail (discarded) unlikely tokens the cutoff removes before sampling
2
Middle tokens kept only if p is set high enough to reach them
1
Most likely tokens (the nucleus, kept) the smallest set whose probabilities sum to p

What does top-p actually do?

Every step, a model produces a probability for each token it could say next. Top-p (nucleus sampling) does not look at every one of them. It sorts the tokens from most to least likely, then walks down the list adding up probabilities until the running total reaches p. That group, the nucleus, is the only set it will draw from. Everything below the line is thrown out for this step.

So p is a dial on how much of the unlikely tail survives. At p of 0.9 the model keeps just enough tokens to cover ninety percent of the probability mass and ignores the scattered remainder. At p of 1.0 nothing is cut, and you are back to plain sampling over the full distribution.

How is it different from temperature?

The two get confused because both affect variety, but they act at different stages. Temperature reshapes the probabilities first, flattening or sharpening them. Top-p then draws a cutoff line through whatever shape is left. You can run them together: temperature decides how steep the slope is, top-p decides how far down the slope you are still willing to pick from.

A common operator habit is to set a moderate temperature and a top-p around 0.9, then leave one of them fixed while tuning the other. Changing both at once makes it hard to tell which knob caused a change in the output, the same trap that bites people tuning any pair of coupled settings.

Low top-p (for example 0.5)

  • Keeps only the safest handful of next tokens
  • Output stays focused and on-topic
  • Good when you want predictable, factual answers
  • Can feel repetitive on creative tasks

High top-p (for example 0.95)

  • Lets the unlikely long tail into the draw
  • Output is more varied and surprising
  • Good for brainstorming and creative writing
  • Raises the chance of an off-topic or odd token

Related terms

← All terms Reviewed: June 2026