GPU: the chip that runs the model : Learn

A GPU (graphics processing unit) is a processor built to do many simple calculations at once. That parallelism is exactly what running a model needs, so the GPU, not the main processor, does the bulk of the work during inference and training.

What does a GPU actually do?

A central processor (CPU) is a few strong cores that do one complicated thing after another, quickly. A GPU (graphics processing unit) is the opposite shape: thousands of weaker cores that all do the same simple calculation at the same time. The maths inside a model, multiplying big grids of numbers, is exactly that shape, so the GPU does almost all of the work when a model runs. The chip was built to draw graphics. It turned out the same trick runs neural networks.

What limits a GPU for local AI?

Two things, and people mix them up. First, memory capacity: the model and its working data have to fit in the memory the GPU can reach, or you get an OOM (an out-of-memory error). Second, memory bandwidth: how fast the chip can pull those weights in, which sets how many tokens per second you actually see. A GPU with plenty of capacity but a slow bus will load a big model and then run it slowly.

On a DGX Spark there is no separate graphics memory soldered to the card. The GPU and the main processor draw from one shared pool, which is how the box fits a large model in a small chassis. Useful to know: the memory your other programs hold is memory the GPU cannot use.

GPU: the chip that runs the model

At a glance

What does a GPU actually do?

What limits a GPU for local AI?

The GPU is good at

It is not the answer for

Related terms

At a glance

What does a GPU actually do?

What limits a GPU for local AI?

The GPU is good at

It is not the answer for

Related terms

Go deeper