Skip to content
7 changes: 7 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,13 @@ llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
```

#### Compare fine-tuning methods
```bash
python scripts/finetuning_comparison/cli_yaml_compare.py \
--first examples/train_lora/qwen3_lora_sft.yaml \
--second examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
```

### Extras

#### Full-Parameter Fine-Tuning using GaLore
Expand Down
93 changes: 93 additions & 0 deletions examples/comparison/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Fine-Tuning Comparison Feature
The purpose of this update is to allow people to easily compare fine-tuning strategies given a specific fine-tuning configuration.
This new feature extends the existing system and is meant to work with pre-defined datasets, algorithms, and metrics.

To compare training metrics, you need to have already defined .yaml configuration files that include the LLM, the dataset, and the fine-tuning strategy. For examples of such configurations, see /examples.

## EXPERIMENTAL FEATURE
⚠️ This model evaluation feature has undergone limited testing. Default settings run a very small number of samples for quick tests. Increasing sample size or batch size may increase memory usage or runtime. ⚠️

---

## Installations

Follow the installation guide on LLaMa's main page (also written below for convenience):

```bash
git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install -e .
pip install -r requirements/metrics.txt
```

### IMPORTANT (for Windows users)
#### Install PyTorch

You need to manually install the GPU version of PyTorch on Windows. Please refer to the official website and run the following:

```bash
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
python -c "import torch; print(torch.cuda.is_available())"
```

#### Install BitsAndBytes

If you want to enable QLoRA on Windows, install a pre-built version of the bitsandbytes library that supports CUDA 11.1 to 12.2. Choose the appropriate release version based on your CUDA version:

```bash
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
```

## Structure

- **Demo:** `/examples/comparison/ft_comparison_demo.py`
Runs sequential fine-tuning algorithms (LoRA and QLoRA) and saves 4 metrics at each checkpoint.

- **Core logic:** `/scripts/finetuning_comparison`
Ensures models are loaded correctly and metrics are computed properly.

- **Tests:** `tests/eval`
Minimal tests to ensure compatibility with existing functionality.

## Running the Demos
### Option 1: Using ft_comparison_demo.py

Open `ft_comparison_demo.py` and adjust the following paths:

- `ft_yaml_1` and `ft_yaml_2`: point to existing training configuration files
- `output_dir`: path to save the results (e.g., `data/ft_comparison_results` or `outputs`)

Save and run the script:

```bash
python examples/comparison/ft_comparison_demo.py
```

### Option 2: Using the CLI
```bash
python scripts/finetuning_comparison/cli_yaml_compare.py \
--first examples/train_lora/qwen3_lora_sft.yaml \
--second examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml \
--out data/ft_comparison_results
```

Adjust the models as needed.

## Features

- Compare 2 existing training configurations and view these metrics:
- Evaluation loss: model accuracy indicator
- Perplexity: reinterpretation of loss, also indicates accuracy
- Latency (ms): responsiveness, relevant for real-time applications
- Peak VRAM (MB): maximum GPU memory usage during inference
- Export a CSV summarizing these metrics
- Generate a plot comparing all metrics side-by-side. If GPU resources are insufficient, generate CSV and plots with dummy data
- Unit tests include fallback strategy for dummy data

## Possible Expansions

- Further testing of the feature in environments with sufficient memory to handle LoRA finetuning and merging weights into the base model.
- Add more relevant metrics (e.g., ROUGE, human evaluation)
- Make input more flexible: accept a dataset, LLM, 2 fine-tuning algorithms (or None), and automatically generate .yaml configs before comparison
- Improve plotting functionality for finer-grained analysis during training
28 changes: 28 additions & 0 deletions examples/comparison/ft_comparison_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env python
#------------------------------IMPORTS---------------------------------#
import os
import sys

# Add the finetuning_comparison folder to path for imports
root_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
sys.path.insert(0, os.path.join(root_path, "scripts", "finetuning_comparison"))

from yaml_compare import compare_two_yamls

#-----------------------------FUNCTIONS---------------------------------#
def main():
"""
Call the comparison function in yaml_compare.py.
This is where you can specify:
- the two models to compare by selecting the paths to their YAML configs;
- the output directory for obtaining evaluation metrics (and plots).
"""
compare_two_yamls(
yaml_1=os.path.join(root_path, "examples/train_lora/qwen3_lora_sft.yaml"), # first model to be fine-tuned
yaml_2=os.path.join(root_path, "examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml"), # second model to be fine-tuned
output_dir=os.path.join(root_path, "data/ft_comparison_results"), # output directory for evaluation metrics
)

#-----------------------------MAIN---------------------------------#
if __name__ == "__main__":
main()
Empty file.
32 changes: 32 additions & 0 deletions scripts/finetuning_comparison/cli_yaml_compare.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python
"""Mapping of the CLI command to compare 2 LLaMA-Factory .yaml files, to the actual function.
"""

#------------------------------IMPORTS---------------------------------#
import argparse

from yaml_compare import compare_two_yamls


#-----------------------------FUNCTIONS---------------------------------#
def parse_args():
"""CLI flag definition:
--first: path to the first model to fine-tune
--second: path to the second model to fine-tune
--out: output directory for evaluation metrics and plots (default: data/ft_comparison_results)
"""
p = argparse.ArgumentParser(description="Compare two YAML trainings.")
p.add_argument("--first", required=True, help="Path to first YAML")
p.add_argument("--second", required=True, help="Path to second YAML")
p.add_argument("--out", default="data/ft_comparison_results", help="Output directory")
return p.parse_args()

def main():
"""Pass the arguments extracted from the CLI to the comparison function.
"""
args = parse_args()
compare_two_yamls(args.first, args.second, args.out)

#-----------------------------MAIN---------------------------------#
if __name__ == "__main__":
main()
Loading