1
1
# llama.cpp/examples/training
2
2
3
+ ## finetune
3
4
This directory contains examples related to language model training using llama.cpp/GGML.
4
5
So far finetuning is technically functional (for FP32 models and limited hardware setups) but the code is very much WIP.
5
6
Finetuning of Stories 260K and LLaMA 3.2 1b seems to work with 24 GB of memory.
@@ -15,3 +16,67 @@ export model_name=llama_3.2-1b && export quantization=f32
15
16
```
16
17
17
18
The perplexity value of the finetuned model should be lower after training on the test set for 2 epochs.
19
+
20
+
21
+ ## finetune-lora
22
+
23
+ LoRA (Low-Rank Adaptation) fine-tuning for efficient model training. This approach trains only a small set of additional parameters while keeping
24
+ the base model frozen, making it memory-efficient.
25
+
26
+ ### Basic Usage
27
+
28
+ ``` sh
29
+ # Create new LoRA adapter with default settings (rank=8, alpha=16, attention modules)
30
+ ./build/bin/llama-finetune-lora -m model.gguf -f dataset.txt -ngl 999 -c 512 -b 512 -ub 512
31
+
32
+ # Custom LoRA parameters(creates new lora adapter and trains it from scratch)
33
+ ./build/bin/llama-finetune-lora -m model.gguf -f dataset.txt -ngl 999 -c 512 -b 512 -ub 512 \
34
+ --lora-rank 16 --lora-alpha 32 --lora-modules " attn_q,attn_k,attn_v,attn_o"
35
+
36
+ # Fine-tune existing LoRA adapter
37
+ ./build/bin/llama-finetune-lora -m base_model.gguf -f dataset.txt --lora existing_adapter.gguf \
38
+ --output-adapter improved_adapter.gguf -ngl 999 -c 512 -b 512 -ub 512
39
+ ```
40
+
41
+
42
+ ### Parameters
43
+
44
+ #### LoRA Configuration
45
+ - ` --lora-rank N ` - LoRA rank (default: 8)
46
+ - Lower rank = smaller adapter, less capacity
47
+ - Higher rank = larger adapter, more capacity
48
+ - ` --lora-alpha N ` - LoRA alpha scaling factor (default: 16.0)
49
+ - Controls adaptation strength
50
+ - Common rule: alpha = 2 × rank
51
+ - ` --lora-modules MODULES ` - Target modules as comma-separated list
52
+ - Available: ` attn_q ` , ` attn_k ` , ` attn_v ` , ` attn_o ` , ` ffn_gate ` , ` ffn_up ` , ` ffn_down ` , ` embed ` , ` output ` , ` all `
53
+ - Default: ` attn_q,attn_k,attn_v,attn_o ` (attention modules)
54
+ - ` --output-adapter PATH ` - Output adapter filename (default: auto-generated)
55
+
56
+ #### Standard Parameters
57
+ - ` -m MODEL ` - Base model file (.gguf)
58
+ - ` -f FILE ` - Training dataset
59
+ - ` -ngl N ` - GPU layers (use 999 for full GPU training)
60
+ - ` -c N ` - Context length (512 recommended for mobile)
61
+
62
+
63
+ ### Using Trained Adapters
64
+
65
+ After training, you'll get a small adapter file. Use it with the original base model:
66
+
67
+ ``` sh
68
+ ./build/bin/llama-cli -m base_model.gguf --lora trained_adapter.gguf -ngl 999
69
+ ```
70
+
71
+ ### Troubleshooting
72
+
73
+ - ** Out of memory** : Reduce context length (` -c 256 ` ), lower rank, or use fewer target modules
74
+ - ** Poor quality** : Increase rank, add more target modules, or train longer
75
+ - ** Large adapter** : Reduce rank or limit target modules
76
+
77
+ ### Help
78
+
79
+ Run with ` --help ` or ` -h ` to see all available parameters:
80
+ ``` sh
81
+ ./build/bin/llama-finetune-lora --help
82
+ ```
0 commit comments