What is a parameter?
A parameter is a single number inside the model: one weight. A model is, at heart, an enormous pile of these numbers arranged in layers. Training is the slow process of nudging every one of them until the whole pile produces useful output. Once training is done the numbers are frozen. Running the model, inference, just reads and multiplies them; it does not change them.
When you see a model named 7B, 35B, or 120B, that letter B is billion and the number is the parameter count. A 7B model carries seven billion learned numbers. The count is the single most quoted fact about a model because it is a quick proxy for two things at once: roughly how much the model could have learned, and roughly how much memory it will demand.
Why does the count drive your hardware?
Every parameter is a number that has to be stored, and during generation streamed through the chip for each token. More parameters means more memory to hold the weights and more data to move per token. That is why parameter count, together with the precision each number is stored at, sets the floor on the memory a model needs, and why quantization, storing each number in fewer bits, is the usual lever to make a big model fit.
One caveat keeps the count honest. For a mixture-of-experts model, only a fraction of the parameters are active on any given token, so a large total can run far cheaper per token than a dense model of the same headline size. The count tells you the storage bill. It does not, on its own, tell you the speed.