TFLOPS: a compute-rate spec, not a speed promise : Learn

TFLOPS (tera floating-point operations per second) is a measure of how many trillion floating-point calculations a processor can perform each second. It is a peak compute-rate spec from the data sheet, not a measured workload result, and it rarely predicts the real tokens-per-second of a running model.

At a glance

What it is

Trillions of floating-point operations per second a chip can do

What it measures

Peak compute rate on the spec sheet, not a real workload

Why it misleads

Token speed is usually set by memory bandwidth, not compute

How to read it

A rough ceiling, never a promise of tokens per second

What does TFLOPS measure?

TFLOPS stands for tera floating-point operations per second: trillions of floating-point calculations per second. It is a peak figure from the data sheet, the most arithmetic a chip could do in a second under ideal conditions. As a rough measure of raw compute, bigger is broadly better, and it is a fair way to compare compute-heavy work like training or image generation.

Why does it rarely predict tokens per second?

Here is the trap. For most local language-model work, the chip is not waiting on maths. It is waiting on memory. To produce each token it has to read the model’s weights out of memory, and on typical hardware that read is slower than the arithmetic that follows. The workload is memory-bound, not compute-bound. So a chip with a high TFLOPS figure can still produce tokens slowly, because the bottleneck sits a step earlier, at memory bandwidth.

This is why you should treat a TFLOPS number as a ceiling, not a forecast. It tells you what the chip could do if compute were the limit. It does not tell you that compute is the limit for your model. The only honest way to learn your real tokens per second is to run your model and measure it. The spec sheet sets an upper bound and stops there.

TFLOPS: a compute-rate spec, not a speed promise

At a glance

Why TFLOPS is not tokens per second

What does TFLOPS measure?

Why does it rarely predict tokens per second?

TFLOPS is useful for

TFLOPS will not tell you

Related terms