MTP: predicting several tokens at once : Learn

Multi-token prediction (MTP) is a speculative-decoding technique where a model proposes several upcoming tokens in one step instead of one, then verifies them in a single pass and keeps the run that holds up. When the guesses are good it produces more tokens per step, which raises decode speed without changing the model's output.

At a glance

What it is

A speculative-decoding method that proposes several tokens per step

What it changes

Speed only; the final tokens match plain decoding

How it pays off

Each accepted guess saves a separate decoding step

Relation to EAGLE

An alternative draft strategy to EAGLE, with different sensitivity to workload

What is multi-token prediction?

Normal text generation produces one token, runs the model again, produces the next, and so on. Multi-token prediction (MTP) tries to do better by proposing several upcoming tokens in a single step, then checking them all together in one pass and keeping the longest run that holds up. It is a form of speculative decoding: you make a cheap guess at what comes next, then verify it with the real model. When the guess is right you got several tokens for the cost of one step. Crucially, the output is the same as plain decoding. MTP changes how fast tokens arrive, not what they say.

When does MTP actually pay off?

It pays off when the next few tokens are predictable enough that the proposals get accepted, because every accepted token skips a separate decoding step. When the guesses miss, the verification work is mostly wasted and the gain shrinks. MTP is one draft strategy among several. EAGLE is another, and the two behave differently across workloads, so the right choice depends on what you run. The takeaway for an operator: speculative decoding is a speed knob, it is part of the inference-stack decision rather than the model-quality decision, and it is worth measuring on your own traffic rather than trusting a headline number.

MTP: predicting several tokens at once

At a glance

How multi-token prediction adds speed

What is multi-token prediction?

When does MTP actually pay off?

MTP helps when

MTP does not help when

Related terms

At a glance

How multi-token prediction adds speed

What is multi-token prediction?

When does MTP actually pay off?

MTP helps when

MTP does not help when

Related terms

Go deeper