Learn

MTP: predicting several tokens at once

Multi-token prediction (MTP) is a speculative-decoding technique where a model proposes several upcoming tokens in one step instead of one, then verifies them in a single pass and keeps the run that holds up. When the guesses are good it produces more tokens per step, which raises decode speed without changing the model's output.

At a glance

What it is
A speculative-decoding method that proposes several tokens per step
What it changes
Speed only; the final tokens match plain decoding
How it pays off
Each accepted guess saves a separate decoding step
Relation to EAGLE
An alternative draft strategy to EAGLE, with different sensitivity to workload
Flow

How multi-token prediction adds speed

Propose several tokens, verify them together, keep the ones that hold up. Accepted guesses skip separate decoding steps; the rejected tail just falls back to normal.

1
Propose several next tokens at once a cheap draft of what comes next
2
Verify them together in one pass the full model checks the draft
3
Keep the accepted run, redo the rest good guesses become free speed

What is multi-token prediction?

Normal text generation produces one token, runs the model again, produces the next, and so on. Multi-token prediction (MTP) tries to do better by proposing several upcoming tokens in a single step, then checking them all together in one pass and keeping the longest run that holds up. It is a form of speculative decoding: you make a cheap guess at what comes next, then verify it with the real model. When the guess is right you got several tokens for the cost of one step. Crucially, the output is the same as plain decoding. MTP changes how fast tokens arrive, not what they say.

When does MTP actually pay off?

It pays off when the next few tokens are predictable enough that the proposals get accepted, because every accepted token skips a separate decoding step. When the guesses miss, the verification work is mostly wasted and the gain shrinks. MTP is one draft strategy among several. EAGLE is another, and the two behave differently across workloads, so the right choice depends on what you run. The takeaway for an operator: speculative decoding is a speed knob, it is part of the inference-stack decision rather than the model-quality decision, and it is worth measuring on your own traffic rather than trusting a headline number.

MTP helps when

  • The next few tokens are easy to guess, so drafts get accepted
  • You want higher decode speed with identical output
  • The draft strategy suits your workload class

MTP does not help when

  • Guesses are rejected often, so the extra work is wasted
  • You expected better answers; it changes speed, not quality
  • Memory is the bottleneck rather than decoding steps

Related terms

← All terms Reviewed: June 2026