What does a GPU actually do?
A central processor (CPU) is a few strong cores that do one complicated thing after another, quickly. A GPU (graphics processing unit) is the opposite shape: thousands of weaker cores that all do the same simple calculation at the same time. The maths inside a model, multiplying big grids of numbers, is exactly that shape, so the GPU does almost all of the work when a model runs. The chip was built to draw graphics. It turned out the same trick runs neural networks.
What limits a GPU for local AI?
Two things, and people mix them up. First, memory capacity: the model and its working data have to fit in the memory the GPU can reach, or you get an OOM (an out-of-memory error). Second, memory bandwidth: how fast the chip can pull those weights in, which sets how many tokens per second you actually see. A GPU with plenty of capacity but a slow bus will load a big model and then run it slowly.
On a DGX Spark there is no separate graphics memory soldered to the card. The GPU and the main processor draw from one shared pool, which is how the box fits a large model in a small chassis. Useful to know: the memory your other programs hold is memory the GPU cannot use.