What is INT4?
INT4 (4-bit integer) stores each weight as a small whole number, four bits wide, which can hold sixteen distinct values. On its own that is far too coarse, so the format also keeps a scale factor (and often an offset) for each small group of weights. To use a weight, the engine multiplies the integer by its group’s scale to recover an approximate real value. The payoff is memory: four bits per weight is about a quarter of 16-bit, which is often the difference between a large model loading on one box and not loading at all.
Why is the method what matters?
Four bits is not much room, so how you choose the integers and scales decides whether the model survives the squeeze. Two INT4 quantizations of the same model can score very differently on the same task. That is why “INT4” alone tells you the size but not the quality. The size is a fact about the file. The quality is a fact about the method, and you only learn it by measuring.
When do you reach for INT4?
Reach for INT4 when a model will not fit in the memory you have at a larger format, or when you want to free room for a longer context. It is the small end of the practical range. Below it, accuracy usually falls off a cliff. Treat a 4-bit model as guilty until your own evaluation proves it innocent.