INT4: four-bit integers, the small end of quantization : Learn

INT4 (4-bit integer) stores each model weight as a 4-bit integer, mapped back to a real value by a scale and offset learned per group of weights. At four bits per weight it uses about a quarter of the memory a 16-bit model needs, which is what lets large models fit on a single box. The accuracy cost is real and depends heavily on how the quantization was done.

What is INT4?

INT4 (4-bit integer) stores each weight as a small whole number, four bits wide, which can hold sixteen distinct values. On its own that is far too coarse, so the format also keeps a scale factor (and often an offset) for each small group of weights. To use a weight, the engine multiplies the integer by its group’s scale to recover an approximate real value. The payoff is memory: four bits per weight is about a quarter of 16-bit, which is often the difference between a large model loading on one box and not loading at all.

Why is the method what matters?

Four bits is not much room, so how you choose the integers and scales decides whether the model survives the squeeze. Two INT4 quantizations of the same model can score very differently on the same task. That is why “INT4” alone tells you the size but not the quality. The size is a fact about the file. The quality is a fact about the method, and you only learn it by measuring.

When do you reach for INT4?

Reach for INT4 when a model will not fit in the memory you have at a larger format, or when you want to free room for a longer context. It is the small end of the practical range. Below it, accuracy usually falls off a cliff. Treat a 4-bit model as guilty until your own evaluation proves it innocent.

INT4: four-bit integers, the small end of quantization

At a glance

INT4 versus 16-bit weights

What is INT4?

Why is the method what matters?

When do you reach for INT4?

INT4 helps with

INT4 does not

Related terms

At a glance

INT4 versus 16-bit weights

What is INT4?

Why is the method what matters?

When do you reach for INT4?

INT4 helps with

INT4 does not

Related terms

Go deeper