Learn

GPU: the chip that runs the model

A GPU (graphics processing unit) is a processor built to do many simple calculations at once. That parallelism is exactly what running a model needs, so the GPU, not the main processor, does the bulk of the work during inference and training.

At a glance

What it is
A processor built for many parallel calculations at once
Why it matters for AI
Model maths is massively parallel, so the GPU carries it
What caps it
Its memory holds the model, its bandwidth sets the speed
On a DGX Spark
The GPU shares one memory pool with the main processor

What does a GPU actually do?

A central processor (CPU) is a few strong cores that do one complicated thing after another, quickly. A GPU (graphics processing unit) is the opposite shape: thousands of weaker cores that all do the same simple calculation at the same time. The maths inside a model, multiplying big grids of numbers, is exactly that shape, so the GPU does almost all of the work when a model runs. The chip was built to draw graphics. It turned out the same trick runs neural networks.

What limits a GPU for local AI?

Two things, and people mix them up. First, memory capacity: the model and its working data have to fit in the memory the GPU can reach, or you get an OOM (an out-of-memory error). Second, memory bandwidth: how fast the chip can pull those weights in, which sets how many tokens per second you actually see. A GPU with plenty of capacity but a slow bus will load a big model and then run it slowly.

On a DGX Spark there is no separate graphics memory soldered to the card. The GPU and the main processor draw from one shared pool, which is how the box fits a large model in a small chassis. Useful to know: the memory your other programs hold is memory the GPU cannot use.

The GPU is good at

  • Doing the same maths across thousands of numbers at once
  • Holding a model's weights close to fast memory
  • Running many requests in parallel when there is room

It is not the answer for

  • Branchy, one-step-at-a-time logic; that is the main processor's job
  • Fitting a model bigger than its memory; you OOM first
  • Going faster than its memory bandwidth allows, whatever the core count

Related terms

← All terms Reviewed: June 2026