How does a model choose the next word?
A language model does not output a sentence. It outputs, one step at a time, a ranked list of possible next tokens, each with a probability. Sampling is the rule that turns that list into a single choice. The simplest rule is greedy: take the most likely token, every time. That is steady and repeatable, but it can also march the model into dull repetition, because the safest word is not always the best one.
The common alternative is top-p, also called nucleus sampling. Instead of always taking the top token, the engine keeps the smallest set of top candidates whose probabilities add up to a chosen share, then draws one from that set at random. A handful of other knobs exist, but top-p plus temperature covers most of what an operator touches.
Why does the sampling choice matter?
Because it sets the trade-off between reproducible and natural. Draw at random and the same prompt gives a different answer each run, which reads as more human and suits creative work. Take the top token every time and you get output you can test and trust, which suits code, extraction, and any retrieval step that has to return the same thing twice.
The practical habit: if a result has to be debugged or reproduced, lean toward the deterministic end. If it has to feel fresh, open the sampling up. And remember where you measure matters too: sample an engine before it has warmed up and you can measure the warmup rather than the model.