hiyouga · caterina0718 · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026 · Feb 7, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -216,6 +216,13 @@ llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
 llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
 ```
 
+#### Compare fine-tuning methods
+```bash
+python scripts/finetuning_comparison/cli_yaml_compare.py \
+    --first examples/train_lora/qwen3_lora_sft.yaml \
+    --second examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
+```
+
 ### Extras
 
 #### Full-Parameter Fine-Tuning using GaLore

diff --git a/examples/comparison/README.md b/examples/comparison/README.md
@@ -0,0 +1,93 @@
+# Fine-Tuning Comparison Feature 
+The purpose of this update is to allow people to easily compare fine-tuning strategies given a specific fine-tuning configuration.
+This new feature extends the existing system and is meant to work with pre-defined datasets, algorithms, and metrics.
+
+To compare training metrics, you need to have already defined .yaml configuration files that include the LLM, the dataset, and the fine-tuning strategy. For examples of such configurations, see /examples.
+
+## EXPERIMENTAL FEATURE
+⚠️ This model evaluation feature has undergone limited testing. Default settings run a very small number of samples for quick tests. Increasing sample size or batch size may increase memory usage or runtime. ⚠️
+
+---
+
+## Installations
+
+Follow the installation guide on LLaMa's main page (also written below for convenience):
+
+```bash
+git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
+cd LlamaFactory
+pip install -e .
+pip install -r requirements/metrics.txt
+```
+
+### IMPORTANT (for Windows users)
+#### Install PyTorch
+
+You need to manually install the GPU version of PyTorch on Windows. Please refer to the official website and run the following:
+
+```bash
+pip uninstall torch torchvision torchaudio
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
+python -c "import torch; print(torch.cuda.is_available())"
+```
+
+#### Install BitsAndBytes
+
+If you want to enable QLoRA on Windows, install a pre-built version of the bitsandbytes library that supports CUDA 11.1 to 12.2. Choose the appropriate release version based on your CUDA version:
+
+```bash
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
+```
+
+## Structure
+
+- **Demo:** `/examples/comparison/ft_comparison_demo.py`  
+  Runs sequential fine-tuning algorithms (LoRA and QLoRA) and saves 4 metrics at each checkpoint.
+
+- **Core logic:** `/scripts/finetuning_comparison`  
+  Ensures models are loaded correctly and metrics are computed properly.
+
+- **Tests:** `tests/eval`  
+  Minimal tests to ensure compatibility with existing functionality.
+
+## Running the Demos
+### Option 1: Using ft_comparison_demo.py
+
+Open `ft_comparison_demo.py` and adjust the following paths:
+
+- `ft_yaml_1` and `ft_yaml_2`: point to existing training configuration files
+- `output_dir`: path to save the results (e.g., `data/ft_comparison_results` or `outputs`)
+
+Save and run the script:
+
+```bash
+python examples/comparison/ft_comparison_demo.py
+```
+
+### Option 2: Using the CLI
+```bash
+python scripts/finetuning_comparison/cli_yaml_compare.py \
+    --first examples/train_lora/qwen3_lora_sft.yaml \
+    --second examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml \
+    --out data/ft_comparison_results
+```
+
+Adjust the models as needed.
+
+## Features
+
+- Compare 2 existing training configurations and view these metrics:
+    - Evaluation loss: model accuracy indicator
+    - Perplexity: reinterpretation of loss, also indicates accuracy
+    - Latency (ms): responsiveness, relevant for real-time applications
+    - Peak VRAM (MB): maximum GPU memory usage during inference
+- Export a CSV summarizing these metrics
+- Generate a plot comparing all metrics side-by-side. If GPU resources are insufficient, generate CSV and plots with dummy data
+- Unit tests include fallback strategy for dummy data
+
+## Possible Expansions
+
+- Further testing of the feature in environments with sufficient memory to handle LoRA finetuning and merging weights into the base model.
+- Add more relevant metrics (e.g., ROUGE, human evaluation)
+- Make input more flexible: accept a dataset, LLM, 2 fine-tuning algorithms (or None), and automatically generate .yaml configs before comparison
+- Improve plotting functionality for finer-grained analysis during training
diff --git a/examples/comparison/ft_comparison_demo.py b/examples/comparison/ft_comparison_demo.py
@@ -0,0 +1,28 @@
+#!/usr/bin/env python
+#------------------------------IMPORTS---------------------------------#
+import os
+import sys
+
+# Add the finetuning_comparison folder to path for imports
+root_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
+sys.path.insert(0, os.path.join(root_path, "scripts", "finetuning_comparison"))
+
+from yaml_compare import compare_two_yamls
+
+#-----------------------------FUNCTIONS---------------------------------#
+def main():
+    """
+    Call the comparison function in yaml_compare.py.
+    This is where you can specify:
+    - the two models to compare by selecting the paths to their YAML configs;
+    - the output directory for obtaining evaluation metrics (and plots).
+    """
+    compare_two_yamls(
+        yaml_1=os.path.join(root_path, "examples/train_lora/qwen3_lora_sft.yaml"), # first model to be fine-tuned
+        yaml_2=os.path.join(root_path, "examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml"), # second model to be fine-tuned
+        output_dir=os.path.join(root_path, "data/ft_comparison_results"), # output directory for evaluation metrics
+    )
+
+#-----------------------------MAIN---------------------------------#
+if __name__ == "__main__":
+    main()
diff --git a/scripts/finetuning_comparison/__init__.py b/scripts/finetuning_comparison/__init__.py
diff --git a/scripts/finetuning_comparison/cli_yaml_compare.py b/scripts/finetuning_comparison/cli_yaml_compare.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python
+"""Mapping of the CLI command to compare 2 LLaMA-Factory .yaml files, to the actual function.
+"""
+
+#------------------------------IMPORTS---------------------------------#
+import argparse
+
+from yaml_compare import compare_two_yamls
+
+
+#-----------------------------FUNCTIONS---------------------------------#
+def parse_args():
+    """CLI flag definition:
+    --first: path to the first model to fine-tune
+    --second: path to the second model to fine-tune
+    --out: output directory for evaluation metrics and plots (default: data/ft_comparison_results)
+    """
+    p = argparse.ArgumentParser(description="Compare two YAML trainings.")
+    p.add_argument("--first", required=True, help="Path to first YAML")
+    p.add_argument("--second", required=True, help="Path to second YAML")
+    p.add_argument("--out", default="data/ft_comparison_results", help="Output directory")
+    return p.parse_args()
+
+def main():
+    """Pass the arguments extracted from the CLI to the comparison function.
+    """
+    args = parse_args()
+    compare_two_yamls(args.first, args.second, args.out)
+
+#-----------------------------MAIN---------------------------------#
+if __name__ == "__main__":
+    main()