A streaming multiprocessor (SM) is the core compute unit inside an NVIDIA GPU (graphics processing unit). A GPU is built from many of them, and each one runs a large group of threads in parallel. The exact SM design is tied to the GPU's architecture, and that design is what code must be compiled for to run at full speed.
At a glance
What it is
The core parallel-compute unit inside an NVIDIA GPU
How a GPU uses them
It contains many; work is spread across all of them at once
Tied to architecture
Each GPU generation has its own SM design and capability name
On a DGX Spark
Its compute-capability name is SM121A
Stack
A GPU is many streaming multiprocessors
Work is split into thread groups and scheduled across the SMs in parallel. More SMs and a newer SM design both matter; the kernel must target the right one.
3
Each streaming multiprocessorruns its assigned threads in parallel
2
The pool of streaming multiprocessorsthread groups are scheduled across them
1
Your kernel (the GPU program)split into many parallel thread groups
What is a streaming multiprocessor?
A GPU (graphics processing unit) does its work by running a vast number of
threads at once, and a streaming multiprocessor (SM) is the unit that actually
runs them. A single GPU is built from many SMs. When you launch a kernel, the
GPU’s program, the work is split into thread groups and those groups are handed
out across the SMs to run side by side. More SMs, broadly, means more work in
flight at the same time, which is why GPUs scale the way they do.
Why does the SM design matter?
Every GPU generation comes with its own SM design, identified by a
compute-capability name. The DGX Spark’s GPU reports as SM121A. This is not
trivia: GPU code is compiled against a specific compute capability, the same way
a binary is built for one instruction set. Kernels built for an older capability
may run on a slower fallback path, or refuse to load, until they are compiled for
the right one. So when an operator sees a model crawl on one box and fly on
another with similar memory, the SM generation and its compute capability are
worth checking before blaming the hardware.
Reports the GPU and how many are present. The streaming-multiprocessor count and architecture sit one level below this, in the GPU's compute-capability profile.
What an SM is
The unit that runs a kernel's threads in parallel
Replicated many times across one GPU
Tied to a specific GPU architecture and capability name
What it is not
A separate chip you can add; it is part of the GPU die
The same across generations; each architecture redesigns it
Interchangeable for compiled code, which targets one capability