A3B: three billion active parameters : Learn

A3B is a naming suffix on mixture-of-experts models that means roughly three billion active parameters per token. The model stores many more parameters in total, but for any single token it routes through only a small fraction. A 30-billion-total model marked A3B activates about 3B of those per token, so its speed and per-token compute track the 3B, not the 30B.

At a glance

What it means

About three billion active parameters per token

What it does not mean

The total parameter count, which is much larger

Why the split exists

It is a mixture-of-experts model, routing each token to a few experts

What it predicts

Per-token speed and compute track the active count, not the total

What does A3B in a model name mean?

A3B reads as “active 3B”, meaning about three billion active parameters per token. It appears on mixture-of-experts (MoE) models, which split their weights into many “experts” and route each token through only a few of them. So a model named with a 30B total and an A3B suffix holds 30 billion parameters on disk and in memory, but any one token only touches around 3 billion of them. The big number is storage; the small number is the work done for each token.

Why is the active count the one to watch?

Because per-token speed and per-token reasoning track the active parameters, not the total. The model still has to load all its weights, so memory needs follow the total. But the arithmetic done to produce each token follows the active count, which is why an A3B model can run far faster than its total size suggests. It also tempers the instinct that a bigger total always wins: the extra weight is stored knowledge sitting in idle experts, not more thinking per token. When you compare two models, line up their active counts before you read anything into the totals.

A3B: three billion active parameters

At a glance

Total parameters versus active parameters

What does A3B in a model name mean?

Why is the active count the one to watch?

A3B tells you

A3B does not tell you

Related terms

At a glance

Total parameters versus active parameters

What does A3B in a model name mean?

Why is the active count the one to watch?

A3B tells you

A3B does not tell you

Related terms

Go deeper