Learn

A3B: three billion active parameters

A3B is a naming suffix on mixture-of-experts models that means roughly three billion active parameters per token. The model stores many more parameters in total, but for any single token it routes through only a small fraction. A 30-billion-total model marked A3B activates about 3B of those per token, so its speed and per-token compute track the 3B, not the 30B.

At a glance

What it means
About three billion active parameters per token
What it does not mean
The total parameter count, which is much larger
Why the split exists
It is a mixture-of-experts model, routing each token to a few experts
What it predicts
Per-token speed and compute track the active count, not the total
Comparison

Total parameters versus active parameters

Total (stored)
Active (per token, A3B)
What it counts
Every weight in the model
Only the weights one token routes through
Sets memory needed
Yes, all weights must be loaded
No, the whole model still loads
Sets per-token speed
No, most weights are idle per token
Yes, this is the work done per token

What does A3B in a model name mean?

A3B reads as “active 3B”, meaning about three billion active parameters per token. It appears on mixture-of-experts (MoE) models, which split their weights into many “experts” and route each token through only a few of them. So a model named with a 30B total and an A3B suffix holds 30 billion parameters on disk and in memory, but any one token only touches around 3 billion of them. The big number is storage; the small number is the work done for each token.

Why is the active count the one to watch?

Because per-token speed and per-token reasoning track the active parameters, not the total. The model still has to load all its weights, so memory needs follow the total. But the arithmetic done to produce each token follows the active count, which is why an A3B model can run far faster than its total size suggests. It also tempers the instinct that a bigger total always wins: the extra weight is stored knowledge sitting in idle experts, not more thinking per token. When you compare two models, line up their active counts before you read anything into the totals.

A3B tells you

  • Roughly how much compute runs per token: about 3B parameters' worth
  • That it is a mixture-of-experts model, not a dense one
  • Why it can be fast for its total size

A3B does not tell you

  • How much memory to load; that is the total parameter count
  • How capable it is overall; stored knowledge lives in the idle experts too
  • That a bigger total beats it; active count drives per-token reasoning

Related terms

← All terms Reviewed: June 2026