Learn

NF4: a 4-bit number format for quantized fine-tuning

NF4 (4-bit NormalFloat) is a way of storing each model weight in just 4 bits, where the sixteen available levels are spaced to match the bell-curve distribution real weights tend to follow rather than being evenly spread. It became common as the frozen base format in quantized fine-tuning, letting a large model be tuned on modest memory.

At a glance

What it is
A 4-bit format whose levels match how weights are distributed
Why 4 bits
It cuts a model's memory footprint to a fraction of full precision
Where it is used
Quantized fine-tuning, as the frozen base the adapter trains on
The trade
Lower precision per weight in exchange for fitting in far less memory

What makes NF4 different from a plain 4-bit number?

NF4 (4-bit NormalFloat) packs each weight into 4 bits, which means only sixteen distinct values are available to represent it. The clever part is where those sixteen levels sit. Model weights are not spread evenly; they cluster around zero in a bell-curve shape. A naive 4-bit format that spaces its levels evenly wastes most of them on ranges that hold few weights. NF4 instead places its levels to match that distribution, so more of the sixteen slots land where the weights actually are, and the rounding error is smaller for the same 4 bits.

Why is it tied to fine-tuning?

NF4 became well known as the frozen base format in quantized fine-tuning. The idea: load a large model with its weights squeezed into NF4, keep them locked, and train only a small set of extra parameters on top, a low-rank adapter (LoRA). The base never changes, so storing it in 4 bits is fine, and the memory you save is what lets a model that would not otherwise fit be tuned on modest hardware. The adapter trains in higher precision, but it is tiny.

The trade is the one every quantization makes: fewer bits per weight means less precision, accepted because the memory saving is what makes the work possible at all. NF4’s shaping is an attempt to lose as little as possible at that bit count.

NF4 (4-bit)

  • Each weight stored in 4 bits
  • Levels shaped to the bell-curve spread of weights
  • Lets a large model load on modest memory
  • Common as the frozen base for quantized fine-tuning

Full-precision weights

  • Each weight stored in 16 or 32 bits
  • Far larger memory footprint
  • More precision per weight
  • What you start from before quantizing

Related terms

← All terms Reviewed: June 2026