Learn

CPU: the general-purpose brain of the box

A CPU (central processing unit) is the main, general-purpose processor of a computer. It runs the operating system and your programs and handles one-step-at-a-time logic well, but it is not where a model's heavy maths runs; that goes to the GPU (graphics processing unit).

At a glance

What it is
The main general-purpose processor running the OS and your programs
Good at
Sequential, branchy logic with a few strong cores
Its role in inference
Loads the model, feeds the GPU, handles everything around the maths
When it bottlenecks
Tokenizing, data loading, and serving many small requests

What is a CPU for, if the GPU does the work?

A CPU (central processing unit) is the general-purpose processor that runs the operating system, your shell, the inference server, and every program that is not the model itself. It has a few strong cores tuned for sequential, branchy work: making decisions, following one step after another, coordinating. When a model runs, the heavy maths goes to the GPU (graphics processing unit), but the CPU is still the one that loaded the model, set up the request, and will hand the answer back. It is the stage manager, not the lead actor.

Where does the CPU still slow you down?

More often than newcomers expect. Turning text into tokens, loading data from disk, and juggling many small requests are all CPU work, and a model can sit idle waiting on them. If your GPU is barely busy while throughput is poor, look at the processor and the data path before blaming the model. On a DGX Spark the CPU and GPU share one memory pool, so the two are not as separate as on a desktop with a discrete card. They are neighbours drawing from the same tap.

The CPU handles

  • The operating system, the server process, the request queue
  • One-step-after-another logic with branches and decisions
  • Tokenizing text and shuffling data to and from the model

Leave to the GPU

  • The big matrix maths of running the model
  • Holding the weights and the key-value (KV) cache in fast memory
  • The token-per-second work you measure as inference speed

Related terms

← All terms Reviewed: June 2026