NF4: a 4-bit number format for quantized fine-tuning : Learn

NF4 (4-bit NormalFloat) is a way of storing each model weight in just 4 bits, where the sixteen available levels are spaced to match the bell-curve distribution real weights tend to follow rather than being evenly spread. It became common as the frozen base format in quantized fine-tuning, letting a large model be tuned on modest memory.

What makes NF4 different from a plain 4-bit number?

NF4 (4-bit NormalFloat) packs each weight into 4 bits, which means only sixteen distinct values are available to represent it. The clever part is where those sixteen levels sit. Model weights are not spread evenly; they cluster around zero in a bell-curve shape. A naive 4-bit format that spaces its levels evenly wastes most of them on ranges that hold few weights. NF4 instead places its levels to match that distribution, so more of the sixteen slots land where the weights actually are, and the rounding error is smaller for the same 4 bits.

Why is it tied to fine-tuning?

NF4 became well known as the frozen base format in quantized fine-tuning. The idea: load a large model with its weights squeezed into NF4, keep them locked, and train only a small set of extra parameters on top, a low-rank adapter (LoRA). The base never changes, so storing it in 4 bits is fine, and the memory you save is what lets a model that would not otherwise fit be tuned on modest hardware. The adapter trains in higher precision, but it is tiny.

The trade is the one every quantization makes: fewer bits per weight means less precision, accepted because the memory saving is what makes the work possible at all. NF4’s shaping is an attempt to lose as little as possible at that bit count.

NF4: a 4-bit number format for quantized fine-tuning

At a glance

What makes NF4 different from a plain 4-bit number?

Why is it tied to fine-tuning?

NF4 (4-bit)

Full-precision weights

Related terms

At a glance

What makes NF4 different from a plain 4-bit number?

Why is it tied to fine-tuning?

NF4 (4-bit)

Full-precision weights

Related terms

Go deeper