Learn

SM: the GPU's unit of parallel work

A streaming multiprocessor (SM) is the core compute unit inside an NVIDIA GPU (graphics processing unit). A GPU is built from many of them, and each one runs a large group of threads in parallel. The exact SM design is tied to the GPU's architecture, and that design is what code must be compiled for to run at full speed.

At a glance

What it is
The core parallel-compute unit inside an NVIDIA GPU
How a GPU uses them
It contains many; work is spread across all of them at once
Tied to architecture
Each GPU generation has its own SM design and capability name
On a DGX Spark
Its compute-capability name is SM121A
Stack

A GPU is many streaming multiprocessors

Work is split into thread groups and scheduled across the SMs in parallel. More SMs and a newer SM design both matter; the kernel must target the right one.

3
Each streaming multiprocessor runs its assigned threads in parallel
2
The pool of streaming multiprocessors thread groups are scheduled across them
1
Your kernel (the GPU program) split into many parallel thread groups

What is a streaming multiprocessor?

A GPU (graphics processing unit) does its work by running a vast number of threads at once, and a streaming multiprocessor (SM) is the unit that actually runs them. A single GPU is built from many SMs. When you launch a kernel, the GPU’s program, the work is split into thread groups and those groups are handed out across the SMs to run side by side. More SMs, broadly, means more work in flight at the same time, which is why GPUs scale the way they do.

Why does the SM design matter?

Every GPU generation comes with its own SM design, identified by a compute-capability name. The DGX Spark’s GPU reports as SM121A. This is not trivia: GPU code is compiled against a specific compute capability, the same way a binary is built for one instruction set. Kernels built for an older capability may run on a slower fallback path, or refuse to load, until they are compiled for the right one. So when an operator sees a model crawl on one box and fly on another with similar memory, the SM generation and its compute capability are worth checking before blaming the hardware.

Check it yourself

nvidia-smi --query-gpu=name,count --format=csv,noheader

Reports the GPU and how many are present. The streaming-multiprocessor count and architecture sit one level below this, in the GPU's compute-capability profile.

What an SM is

  • The unit that runs a kernel's threads in parallel
  • Replicated many times across one GPU
  • Tied to a specific GPU architecture and capability name

What it is not

  • A separate chip you can add; it is part of the GPU die
  • The same across generations; each architecture redesigns it
  • Interchangeable for compiled code, which targets one capability

Related terms

← All terms Reviewed: June 2026