Learn

Beam search: keeping several drafts at once

Beam search is a way of choosing a model's output where, instead of picking one next token, the decoder keeps a fixed number of partial sequences (the beams) and at every step extends and re-ranks them, keeping only the best few. It trades speed and variety for a more globally likely sequence.

At a glance

What it is
A decoder that keeps several candidate sequences alive at once
The beam width
How many candidates it carries; more is thorough but slower
What it optimises
The overall likelihood of the whole sequence, not each token alone
When it is rare
Open-ended chat, where sampling gives more natural variety

How does beam search work?

Most chat models pick output one token at a time. Beam search refuses to commit that early. It keeps a fixed number of partial sequences alive, called the beam width, and at each step it extends every one of them, scores the results, and keeps only the best handful. By carrying several drafts forward it can recover from a token that looked good locally but led somewhere worse, which a one-token greedy choice cannot.

The payoff is a sequence with a higher overall likelihood. The cost is work: a wider beam means more candidates to extend and score every step, so it is slower and uses more memory than drawing a single token.

When would you use it, and when not?

Beam search earns its keep where there is a “correct” target to converge on: translation, summarisation, structured extraction, anything where you want the most probable faithful rendering rather than a creative one. Because it is deterministic, the same input yields the same output, which is handy when you need repeatable results.

For open-ended chat it is usually the wrong tool. Optimising hard for the most likely sequence tends to produce flat, generic, sometimes looping text, which is why sampling with temperature and top-p is the default for conversational models. The rule of thumb: beam search when there is one right answer to find, sampling when variety is the point.

Beam search

  • Keeps several candidate sequences and ranks them
  • Aims for the most likely overall sequence
  • Deterministic: same input gives the same output
  • Can sound flat or repetitive on open-ended text

Sampling

  • Draws one token at a time from the distribution
  • Optimises locally, step by step
  • Varies between runs unless a seed is fixed
  • Reads as more natural for chat and creative work

Related terms

← All terms Reviewed: June 2026