Learn

Parameter: a single learned number in a model

A parameter is one learned number, a weight, inside a model. Training is the process of setting billions of these numbers so the model produces useful output. The headline size of a model, like 7B, is its parameter count: 7B means seven billion parameters. More parameters can mean more capability, and always means more numbers to store and stream, so it drives how much memory the model needs.

At a glance

What it is
One learned number, a weight, inside the model
What 7B means
Seven billion parameters; the B is for billion
Why the count matters
It drives memory needed and hints at capability
Where it comes from
Training sets the values; inference just uses them

What is a parameter?

A parameter is a single number inside the model: one weight. A model is, at heart, an enormous pile of these numbers arranged in layers. Training is the slow process of nudging every one of them until the whole pile produces useful output. Once training is done the numbers are frozen. Running the model, inference, just reads and multiplies them; it does not change them.

When you see a model named 7B, 35B, or 120B, that letter B is billion and the number is the parameter count. A 7B model carries seven billion learned numbers. The count is the single most quoted fact about a model because it is a quick proxy for two things at once: roughly how much the model could have learned, and roughly how much memory it will demand.

Why does the count drive your hardware?

Every parameter is a number that has to be stored, and during generation streamed through the chip for each token. More parameters means more memory to hold the weights and more data to move per token. That is why parameter count, together with the precision each number is stored at, sets the floor on the memory a model needs, and why quantization, storing each number in fewer bits, is the usual lever to make a big model fit.

One caveat keeps the count honest. For a mixture-of-experts model, only a fraction of the parameters are active on any given token, so a large total can run far cheaper per token than a dense model of the same headline size. The count tells you the storage bill. It does not, on its own, tell you the speed.

What the parameter count tells you

  • Roughly how much memory the weights will need to load
  • A rough ceiling on how much the model can have learned
  • How heavy each token is to compute, for a dense model

What it does not tell you

  • Actual quality; a smaller well-trained model can beat a bigger one
  • How fast it runs, which also depends on memory bandwidth and the engine
  • How many parameters are active per token, which differs for mixture-of-experts models

Related terms

← All terms Reviewed: June 2026