What does LoRA actually do?
Full fine-tuning rewrites every weight in a model. That means holding the whole model plus optimiser state in memory and paying for the compute to update all of it. On your own hardware that is often out of reach.
LoRA (Low-Rank Adaptation) takes a different route. It freezes the original weights and adds a small pair of matrices next to them, the adapter. Only those small matrices train. The trick is that the change a fine-tune needs can be captured by a low-rank update, which is a compact way of saying “a small number of values, not millions”. You get most of the benefit of fine-tuning for a small slice of the cost.
Why does it fit on modest hardware?
Because you are training the adapter, not the model, the memory and compute bill shrinks to match. A box that could never full-fine-tune a 70B model can often train a LoRA adapter against it. The base model still has to fit for inference, but the extra weight of training is small.
The other practical win is the artefact. A LoRA adapter is a tiny file, often megabytes rather than the gigabytes of the base model. You keep it separate, swap between several adapters over one frozen base, and back up only the part that is genuinely yours and not reproducible from a public download. Quantize the base model to make room, and you can fine-tune and serve on the same modest box.