What is multi-token prediction?
Normal text generation produces one token, runs the model again, produces the next, and so on. Multi-token prediction (MTP) tries to do better by proposing several upcoming tokens in a single step, then checking them all together in one pass and keeping the longest run that holds up. It is a form of speculative decoding: you make a cheap guess at what comes next, then verify it with the real model. When the guess is right you got several tokens for the cost of one step. Crucially, the output is the same as plain decoding. MTP changes how fast tokens arrive, not what they say.
When does MTP actually pay off?
It pays off when the next few tokens are predictable enough that the proposals get accepted, because every accepted token skips a separate decoding step. When the guesses miss, the verification work is mostly wasted and the gain shrinks. MTP is one draft strategy among several. EAGLE is another, and the two behave differently across workloads, so the right choice depends on what you run. The takeaway for an operator: speculative decoding is a speed knob, it is part of the inference-stack decision rather than the model-quality decision, and it is worth measuring on your own traffic rather than trusting a headline number.