Temperature: the randomness knob on a model : Learn

Temperature is a sampling setting that scales how much randomness goes into picking each next token. Low temperature makes the model favour its most likely choice, so output is focused and close to repeatable. High temperature flattens the odds so less likely words get picked more often, which reads as more creative and more erratic. At temperature zero the model is effectively deterministic: the same prompt returns the same answer.

What does temperature actually change?

At every step a language model produces a list of candidate next tokens, each with a probability. Temperature decides how literally it takes those probabilities. At a low temperature the model leans hard on its top pick, so it stays on the obvious, safe path. At a high temperature the gaps between candidates shrink, so a word the model thought unlikely still gets its turn now and then. That is the whole effect: it does not make the model smarter or dumber, it changes how willing the model is to gamble on its less favourite words.

The visible result is variety. Low temperature gives you focused, near-repeatable text. High temperature gives you range, and with it the risk that the model wanders somewhere wrong and says it with the same confidence as everything else.

What should you set it to?

There is no universally correct value, only a correct value for the job. If the output feeds something that has to be trusted or reproduced, a retrieval lookup, a code generation step, a fact extraction, push temperature down. Near zero the model becomes effectively deterministic, which means the same prompt returns the same answer, and a result you can reproduce is a result you can debug.

If the job is creative and there is no single right answer, draft titles, prose options, idea lists, raise it. You trade reproducibility for range. The honest operator habit is to default low, then raise it only where variety is the point, not the other way round.

Check it yourself

curl -s localhost:8000/v1/completions -H 'content-type: application/json' -d '{"model":"local","prompt":"Write one word:","temperature":0,"max_tokens":5}'

Run it twice at temperature 0 and the outputs match. Raise temperature and the two runs start to diverge. Adjust the URL and model name to your own server.

Temperature: the randomness knob on a model

At a glance

Low versus high temperature

What does temperature actually change?

What should you set it to?

Check it yourself

Turn it down when

Turn it up when

Related terms

At a glance

Low versus high temperature

What does temperature actually change?

What should you set it to?

Check it yourself

Turn it down when

Turn it up when

Related terms

Go deeper