VRAM: the memory that decides what you can run : Learn

VRAM (video random-access memory) is the memory attached to a graphics processor (GPU), where a model's weights and live request data sit while it runs. How much you have caps how big a model and how long a context you can hold. On a unified-memory box like the DGX Spark there is no separate VRAM: the GPU and the operating system draw from one shared pool.

Why does VRAM decide what you can run?

Everything a model needs while it runs has to fit in memory the GPU can reach: the weights, the key-value (KV) cache that grows with context length, and some scratch space. VRAM is that memory. A 70B model in a small weight format might need tens of gigabytes before you have served a single token, so the size of the model you can run is set, first of all, by how much VRAM you have. Run a longer context and the KV cache grows into the same space, which is why “it fit yesterday” and “it OOMs today” can both be true.

What changes on a unified-memory box?

A discrete GPU has its own VRAM soldered to the card, separate from the system RAM (random-access memory) the operating system uses. A DGX Spark does not work that way. There is one pool of memory, and the GPU and the OS both draw from it. That is a feature, it is how the box fits a large model in a small chassis, but it has a catch: the memory your browser and background services hold is memory your model cannot. So “how much VRAM do I have” becomes “how much of the shared pool is free right now”, and the answer moves while you work.

The practical upshot: treat the number from nvidia-smi as a live budget, not a fixed spec. The ceiling is fixed. What is free under it is not.

VRAM: the memory that decides what you can run

At a glance

Where the model's memory lives

Why does VRAM decide what you can run?

What changes on a unified-memory box?

Check it yourself

More VRAM lets you

It will not fix

Related terms

At a glance

Where the model's memory lives

Why does VRAM decide what you can run?

What changes on a unified-memory box?

Check it yourself

More VRAM lets you

It will not fix

Related terms

Go deeper