Learn

LoRA: cheap fine-tuning with small adapter matrices

LoRA (Low-Rank Adaptation) is a fine-tuning method that freezes the original model weights and trains only a small pair of low-rank adapter matrices alongside them. You get a customised model for a fraction of the memory and compute that full fine-tuning would cost, and the adapter is a tiny file you can keep separate from the base model.

At a glance

What it is
A fine-tuning method that trains small adapter matrices, not the whole model
Stands for
Low-Rank Adaptation
Why it matters
Far less memory and compute than full fine-tuning, so it runs on modest hardware
What you keep
A small adapter file, separate from the frozen base model

What does LoRA actually do?

Full fine-tuning rewrites every weight in a model. That means holding the whole model plus optimiser state in memory and paying for the compute to update all of it. On your own hardware that is often out of reach.

LoRA (Low-Rank Adaptation) takes a different route. It freezes the original weights and adds a small pair of matrices next to them, the adapter. Only those small matrices train. The trick is that the change a fine-tune needs can be captured by a low-rank update, which is a compact way of saying “a small number of values, not millions”. You get most of the benefit of fine-tuning for a small slice of the cost.

Why does it fit on modest hardware?

Because you are training the adapter, not the model, the memory and compute bill shrinks to match. A box that could never full-fine-tune a 70B model can often train a LoRA adapter against it. The base model still has to fit for inference, but the extra weight of training is small.

The other practical win is the artefact. A LoRA adapter is a tiny file, often megabytes rather than the gigabytes of the base model. You keep it separate, swap between several adapters over one frozen base, and back up only the part that is genuinely yours and not reproducible from a public download. Quantize the base model to make room, and you can fine-tune and serve on the same modest box.

LoRA

  • Trains a small pair of adapter matrices, base weights stay frozen
  • Fits on modest hardware, the scratch space stays small
  • Produces a tiny adapter you can swap in and out
  • Cheap enough to try several variants

Full fine-tuning

  • Updates every weight in the model
  • Needs memory and compute for the whole model plus optimiser state
  • Produces a full new copy of the model
  • Expensive enough that you think twice before each run

Related terms

← All terms Reviewed: June 2026