Learn

EAGLE: a feature-predicting draft head

EAGLE is a speculative-decoding scheme where the draft is a small extra head trained to predict the base model's next-layer features, which raises how often its proposed tokens are accepted.

At a glance

What it is
A lightweight draft head added to the base model
How it drafts
It predicts the base model's own next-layer features
Why that helps
Drafts agree with the big model more often, so acceptance is higher
What to watch
Sensitive to content and to the serving build

How does EAGLE work?

EAGLE is a specific way to build the draft inside speculative decoding. Instead of a fully separate small model, it adds a lightweight extra head trained to predict the base model’s own next-layer features. Because the draft is anchored to the big model’s internal representations, its proposed tokens tend to match what the big model would pick.

Higher agreement means a higher acceptance rate, so more of each draft run gets accepted per verification pass. The big model still verifies everything and still owns the final output, so the result is unchanged. EAGLE only tries to make the draft a closer match.

Why does it matter?

A better-aligned draft is the lever that turns speculative decoding from “sometimes faster” into “reliably faster”. That is the appeal of EAGLE, it raises acceptance without changing what the model says.

The catch is that the gain depends on conditions. Acceptance can shift with the content being generated, and the scheme can be sensitive to the serving build. On this stack EAGLE regressed on one SGLang nightly, so it is worth measuring on your own build rather than assuming the win holds.

EAGLE draft

  • Predicts the base model's internal features for closer agreement

Separate small model

  • Generates tokens independently of the base model's internals

Related terms

← All terms Reviewed: June 2026