Learn

MXFP4: four-bit weights with a shared scale

MXFP4 (micro-scaled 4-bit floating point) is a way of storing model weights in four bits each, where every small block of weights shares a single separate scale factor. The shared scale lets a block stretch to fit its own range, recovering most of the accuracy a flat four-bit format throws away. It is an open industry standard, not a single vendor's format.

At a glance

What it is
A four-bit floating-point weight format with a per-block shared scale
Why the scale
Each block stretches to its own range, keeping more accuracy
Standard, not proprietary
An open format backed by several hardware vendors
The catch
Full speed needs kernels built for the exact GPU architecture
Comparison

Flat four-bit versus micro-scaled four-bit

Flat 4-bit
MXFP4 (micro-scaled)
Bits per weight
Four
Four
Scale factor
One range for everything
One per small block of weights
Accuracy kept
Less; outliers get clipped
More; each block fits its own range

What is MXFP4?

MXFP4 stands for micro-scaled 4-bit floating point. Each weight is stored in just four bits, which is tiny, but the format adds one trick: every small block of weights, commonly thirty-two of them, shares a single separate scale factor. That scale lets each block stretch to fit its own range of values. A flat four-bit format has to cover everything with one range and clips the outliers; the per-block scale buys most of that lost accuracy back. The “MX” is the micro-scaling; the “FP4” is the four-bit float.

Why does MXFP4 matter on a small box?

Four-bit weights are how a large model fits into a modest memory budget, and MXFP4 makes those four bits accurate enough to be worth using. It is an open industry standard rather than one company’s invention, backed by several hardware makers, so models can be trained in it directly instead of being squeezed down afterward. The honest caveat is speed: a format being supported by the silicon is not the same as the fast code being compiled for your exact GPU. A model in MXFP4 can still crawl on a brand-new chip until the right kernels ship, which is a kernel problem, not a format problem.

MXFP4 helps with

  • Fitting a large model into a small memory budget
  • Keeping accuracy that a flat four-bit format would lose
  • Running on hardware that supports the format natively

MXFP4 will not fix

  • A model that is still too big even at four bits
  • Speed when the fast kernels are not compiled for your GPU yet
  • Capability the model never had; quantization preserves, it does not add

Related terms

← All terms Reviewed: June 2026