SM: the GPU's unit of parallel work : Learn

A streaming multiprocessor (SM) is the core compute unit inside an NVIDIA GPU (graphics processing unit). A GPU is built from many of them, and each one runs a large group of threads in parallel. The exact SM design is tied to the GPU's architecture, and that design is what code must be compiled for to run at full speed.

What is a streaming multiprocessor?

A GPU (graphics processing unit) does its work by running a vast number of threads at once, and a streaming multiprocessor (SM) is the unit that actually runs them. A single GPU is built from many SMs. When you launch a kernel, the GPU’s program, the work is split into thread groups and those groups are handed out across the SMs to run side by side. More SMs, broadly, means more work in flight at the same time, which is why GPUs scale the way they do.

Why does the SM design matter?

Every GPU generation comes with its own SM design, identified by a compute-capability name. The DGX Spark’s GPU reports as SM121A. This is not trivia: GPU code is compiled against a specific compute capability, the same way a binary is built for one instruction set. Kernels built for an older capability may run on a slower fallback path, or refuse to load, until they are compiled for the right one. So when an operator sees a model crawl on one box and fly on another with similar memory, the SM generation and its compute capability are worth checking before blaming the hardware.

SM: the GPU's unit of parallel work

At a glance

A GPU is many streaming multiprocessors

What is a streaming multiprocessor?

Why does the SM design matter?

Check it yourself

What an SM is

What it is not

Related terms

At a glance

A GPU is many streaming multiprocessors

What is a streaming multiprocessor?

Why does the SM design matter?

Check it yourself

What an SM is

What it is not

Related terms

Go deeper