What is a CPU for, if the GPU does the work?
A CPU (central processing unit) is the general-purpose processor that runs the operating system, your shell, the inference server, and every program that is not the model itself. It has a few strong cores tuned for sequential, branchy work: making decisions, following one step after another, coordinating. When a model runs, the heavy maths goes to the GPU (graphics processing unit), but the CPU is still the one that loaded the model, set up the request, and will hand the answer back. It is the stage manager, not the lead actor.
Where does the CPU still slow you down?
More often than newcomers expect. Turning text into tokens, loading data from disk, and juggling many small requests are all CPU work, and a model can sit idle waiting on them. If your GPU is barely busy while throughput is poor, look at the processor and the data path before blaming the model. On a DGX Spark the CPU and GPU share one memory pool, so the two are not as separate as on a desktop with a discrete card. They are neighbours drawing from the same tap.