Top-p: the probability cutoff for sampling : Learn

Top-p (nucleus sampling) is a cutoff applied while choosing the next token: the model sorts candidate tokens by probability, keeps the smallest group whose probabilities sum to p, and samples from only that group. A low p keeps the choice tight and safe; a high p lets in the long tail of unlikely tokens.

What does top-p actually do?

Every step, a model produces a probability for each token it could say next. Top-p (nucleus sampling) does not look at every one of them. It sorts the tokens from most to least likely, then walks down the list adding up probabilities until the running total reaches p. That group, the nucleus, is the only set it will draw from. Everything below the line is thrown out for this step.

So p is a dial on how much of the unlikely tail survives. At p of 0.9 the model keeps just enough tokens to cover ninety percent of the probability mass and ignores the scattered remainder. At p of 1.0 nothing is cut, and you are back to plain sampling over the full distribution.

How is it different from temperature?

The two get confused because both affect variety, but they act at different stages. Temperature reshapes the probabilities first, flattening or sharpening them. Top-p then draws a cutoff line through whatever shape is left. You can run them together: temperature decides how steep the slope is, top-p decides how far down the slope you are still willing to pick from.

A common operator habit is to set a moderate temperature and a top-p around 0.9, then leave one of them fixed while tuning the other. Changing both at once makes it hard to tell which knob caused a change in the output, the same trap that bites people tuning any pair of coupled settings.

Top-p: the probability cutoff for sampling

At a glance

How top-p trims the candidate list

What does top-p actually do?

How is it different from temperature?

Low top-p (for example 0.5)

High top-p (for example 0.95)

Related terms

At a glance

How top-p trims the candidate list

What does top-p actually do?

How is it different from temperature?

Low top-p (for example 0.5)

High top-p (for example 0.95)

Related terms

Go deeper