Skip to content

Commit 0c1ffd1

Browse files
committed
Update README with finetune-lora
Signed-off-by: vineet <[email protected]>
1 parent 3f295e1 commit 0c1ffd1

File tree

1 file changed

+65
-0
lines changed

1 file changed

+65
-0
lines changed

examples/training/README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# llama.cpp/examples/training
22

3+
## finetune
34
This directory contains examples related to language model training using llama.cpp/GGML.
45
So far finetuning is technically functional (for FP32 models and limited hardware setups) but the code is very much WIP.
56
Finetuning of Stories 260K and LLaMA 3.2 1b seems to work with 24 GB of memory.
@@ -15,3 +16,67 @@ export model_name=llama_3.2-1b && export quantization=f32
1516
```
1617

1718
The perplexity value of the finetuned model should be lower after training on the test set for 2 epochs.
19+
20+
21+
## finetune-lora
22+
23+
LoRA (Low-Rank Adaptation) fine-tuning for efficient model training. This approach trains only a small set of additional parameters while keeping
24+
the base model frozen, making it memory-efficient.
25+
26+
### Basic Usage
27+
28+
```sh
29+
# Create new LoRA adapter with default settings (rank=8, alpha=16, attention modules)
30+
./build/bin/llama-finetune-lora -m model.gguf -f dataset.txt -ngl 999 -c 512 -b 512 -ub 512
31+
32+
# Custom LoRA parameters(creates new lora adapter and trains it from scratch)
33+
./build/bin/llama-finetune-lora -m model.gguf -f dataset.txt -ngl 999 -c 512 -b 512 -ub 512 \
34+
--lora-rank 16 --lora-alpha 32 --lora-modules "attn_q,attn_k,attn_v,attn_o"
35+
36+
# Fine-tune existing LoRA adapter
37+
./build/bin/llama-finetune-lora -m base_model.gguf -f dataset.txt --lora existing_adapter.gguf \
38+
--output-adapter improved_adapter.gguf -ngl 999 -c 512 -b 512 -ub 512
39+
```
40+
41+
42+
### Parameters
43+
44+
#### LoRA Configuration
45+
- `--lora-rank N` - LoRA rank (default: 8)
46+
- Lower rank = smaller adapter, less capacity
47+
- Higher rank = larger adapter, more capacity
48+
- `--lora-alpha N` - LoRA alpha scaling factor (default: 16.0)
49+
- Controls adaptation strength
50+
- Common rule: alpha = 2 × rank
51+
- `--lora-modules MODULES` - Target modules as comma-separated list
52+
- Available: `attn_q`, `attn_k`, `attn_v`, `attn_o`, `ffn_gate`, `ffn_up`, `ffn_down`, `embed`, `output`, `all`
53+
- Default: `attn_q,attn_k,attn_v,attn_o` (attention modules)
54+
- `--output-adapter PATH` - Output adapter filename (default: auto-generated)
55+
56+
#### Standard Parameters
57+
- `-m MODEL` - Base model file (.gguf)
58+
- `-f FILE` - Training dataset
59+
- `-ngl N` - GPU layers (use 999 for full GPU training)
60+
- `-c N` - Context length (512 recommended for mobile)
61+
62+
63+
### Using Trained Adapters
64+
65+
After training, you'll get a small adapter file. Use it with the original base model:
66+
67+
```sh
68+
./build/bin/llama-cli -m base_model.gguf --lora trained_adapter.gguf -ngl 999
69+
```
70+
71+
### Troubleshooting
72+
73+
- **Out of memory**: Reduce context length (`-c 256`), lower rank, or use fewer target modules
74+
- **Poor quality**: Increase rank, add more target modules, or train longer
75+
- **Large adapter**: Reduce rank or limit target modules
76+
77+
### Help
78+
79+
Run with `--help` or `-h` to see all available parameters:
80+
```sh
81+
./build/bin/llama-finetune-lora --help
82+
```

0 commit comments

Comments
 (0)