You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -516,6 +516,20 @@ To learn more about model quantization, [read this documentation](tools/quantize
516
516
517
517
</details>
518
518
519
+
## LoRA Fine-Tuning
520
+
521
+
llama.cpp includes native [LoRA](https://arxiv.org/abs/2106.09685) (Low-Rank Adaptation) fine-tuning across CPU, Vulkan, Metal and CUDA backends.
522
+
523
+
LoRA fine-tuning represents the weight updates with two smaller matrices through low-rank decomposition while keeping the base model frozen. These new matrices can be trained to adapt to the new data while keeping the overall number of changes low. This makes training possible on devices with very limited memory, including phones and integrated GPUs. Key capabilities include:
524
+
525
+
- Train LoRA adapters on any GPU (NVIDIA, AMD, Intel, Apple, Mali, Adreno)
526
+
- Full support for FP32/FP16/Q8/Q4 training paths
527
+
- Instruction-tuning via assistant-only masked loss
528
+
- Checkpointing + resumable training
529
+
- Merge LoRA adapters back into a base model `model.gguf`
530
+
- Compatible with Qwen3, Gemma, LLaMA, TinyLlama, and other GGUF models
531
+
532
+
The [Finetuning Guide](examples/training//README.md) has more details.
0 commit comments