What does BF16 actually store?
A floating-point number splits its bits between range, how large or small a value can be, and precision, how finely it can be pinned down inside that range. A full 32-bit float spends bits generously on both. BF16, the 16-bit Brain Floating Point format, cuts the total to 16 bits but spends them in a deliberate way: it keeps the full range of the 32-bit float and sacrifices precision instead. So a BF16 value can be as large or as tiny as a 32-bit one, it just cannot be specified as exactly.
For neural networks this is a good bargain. Models are forgiving about small rounding in each weight, but they break badly when a value overflows past the largest number the format can hold. Keeping the wide range avoids that cliff, while halving the memory a model takes.
How does it compare to the other 16-bit format?
The other common half-size format is FP16, the older 16-bit float. FP16 spends more of its bits on precision and less on range, which makes it more exact for values in its comfort zone but prone to overflow or underflow at the extremes. BF16 made the opposite call, and for large models that call has mostly won: the wide range is worth more than the extra precision. In practice BF16 is often the reference quality a model starts from, and smaller formats like 8-bit and 4-bit are judged by how close they stay to it. When a mixed-precision scheme keeps a few sensitive layers in BF16 while squeezing the rest, it is using BF16 as the safe, high-quality anchor.