Skip to content

Commit 0c28d3b

Browse files
jpablomchYuan0320
andauthored
NLS Fast eval and improved output (#3508)
### Changes Implements fast evaluation in NLS and improves the output. ### Reason for changes Accelerates NLS evaluations. ### Related tickets https://jira.devtools.intel.com/browse/CVS-167422 ### Tests Added fast evaluation to NLS test. --------- Signed-off-by: J. Pablo Muñoz <pablo.munoz@intel.com> Co-authored-by: Yuan0320 <jinjie.yuan@intel.com>
1 parent f10228a commit 0c28d3b

File tree

3 files changed

+337
-174
lines changed

3 files changed

+337
-174
lines changed

examples/llm_compression/torch/downstream_qat_with_nls/README.md

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,30 @@ For detailed information about the methodology and format, please refer to this
99
<img src="/examples/llm_compression/torch/downstream_qat_with_nls/pics/lora_vs_nls.png" alt="LoRA vs NLS" width="400"/>
1010
</p>
1111

12+
## Install requirements
13+
14+
To use this example:
15+
16+
- Create a separate Python* environment and activate it: `python3 -m venv nncf_env && source nncf_env/bin/activate`
17+
- Install dependencies:
18+
19+
```bash
20+
pip install -U pip
21+
pip install -r requirements.txt
22+
pip install -e ../../../../
23+
```
24+
25+
## Run Example
26+
1227
[main.py](main.py) supports fine-tuning and evaluating a language model with quantization-aware training and **Neural Low-Rank Adapter Search (NLS)** proposed by [Shears](https://arxiv.org/abs/2404.10934) and [SQFT](https://arxiv.org/abs/2410.03750) on various downstream tasks. For example, to run the script for the task [openbookqa](https://huggingface.co/datasets/allenai/openbookqa), you can use the following command:
1328

1429
```bash
15-
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --task openbookqa --lr 1e-4 --epochs 3 --batch_size 16 --eval_batch_size 64 --lora_rank_space 32 24 16
30+
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --fast_eval --task openbookqa --lr 1e-4 --epochs 3 --batch_size 16 --eval_batch_size 64 --lora_rank_space 32 24 16
1631
```
1732

1833
- `--pretrained`: The model ID or path of a pretrained Hugging Face model configuration.
1934
- `--output_dir`: Path to the directory for storing logs, tuning checkpoints, compressed models, and evaluation results.
35+
- `--fast_eval`: Enable faster evaluation by applying in-place quantization to the model weights.
2036
- `--task`: The evaluation task to be performed. Choices: ["gsm8k", "hellaswag", "openbookqa", "winogrande", "arc_challenge", "arc_easy"].
2137
- `--lr`: Learning rate for fine-tuning.
2238
- `--epochs`: Number of epochs for training.
@@ -26,11 +42,12 @@ python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --task
2642
- `--eval_only`: Whether to perform evaluation only. If specified, the model will be loaded from the checkpoint for evaluation.
2743
- `--resume`: Whether to resume training from a checkpoint. If specified, the script will load the trained checkpoint and continue training or evaluation.
2844
- `--custom_rank_config`: Specifies the LoRA rank of adapters per layer.
45+
- `--num_min_loss_configs`: Number of configurations to evaluate for the min loss heuristic.
2946

30-
Regarding evaluation, the script will automatically use a heuristic to obtain a good configuration for evaluation. This default strategy takes advantage of some information from the training phase and requires the evaluation of only 7 suggested configurations. This is automatically done in the example script, and only the best configuration from these candidates is returned to the user. More powerful elastic LoRA NLS configurations can be optionally obtained through more advanced search algorithms. We also support testing a custom configuration for evaluation after training. The following command will load the trained checkpoint and test the specified LoRA rank configuration:
47+
Regarding evaluation, the script will automatically use a heuristic to obtain a good configuration for evaluation. This default strategy takes advantage of some information from the training phase and requires the evaluation of only 7 suggested configurations (median + frequent + 5 min loss). This is automatically done in the example script, and only the best configuration from these candidates is returned to the user. More powerful elastic LoRA NLS configurations can be optionally obtained through more advanced search algorithms. We also support testing a custom configuration for evaluation after training. The following command will load the trained checkpoint and test the specified LoRA rank configuration:
3148

3249
```bash
33-
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --eval_only --resume --task openbookqa --lora_rank_space 32 24 16 --custom_rank_config 32 24 16 24 24 32 24 32 32 16 24 16 24 32 24 16 24 24 32 32 24 32 32 16 32 32 24 32
50+
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --fast_eval --resume --eval_only --task openbookqa --lora_rank_space 32 24 16 --custom_rank_config 32 24 16 24 24 32 24 32 32 16 24 16 24 32 24 16 24 24 32 32 24 32 32 16 32 32 24 32
3451
```
3552

3653
This script also supports running the vanilla LoRA method. We only need to pass a single number for `--lora_rank_space`, such as `--lora_rank_space 32`. In addition, the training time of LoRA and NLS is very similar, and there is almost no overhead in activating different sub-adapters during training. For instance, fine-tuning the compressed Llama-3.2-3B-Instruct model for 3 epochs on [arc-challenge](https://huggingface.co/datasets/allenai/ai2_arc) takes 161.83 seconds with LoRA and 164.89 seconds with NLS.
@@ -49,17 +66,17 @@ INT4 (LoRA + PTWC) results are derived from the best BF16 (LoRA) model using the
4966

5067
| Model | BF16 | BF16 (LoRA) | INT4 (LoRA + PTWC) | INT4 (QAT + LoRA) | INT4 (QAT + NLS) |
5168
|--------------------------------------|-------|-------------|--------------------|-------------------|------------------|
52-
| meta-llama/Meta-Llama-3-8B | 0.6233| 0.7277 | 0.7167 | 0.7236 | **0.7350** |
53-
| meta-llama/Meta-Llama-3-8B-Instruct | 0.6286| 0.7148 | 0.7098 | 0.7076 | **0.7128** |
54-
| meta-llama/Llama-3.1-8B | 0.6310| 0.7330 | 0.7201 | 0.7243 | **0.7297** |
55-
| meta-llama/Llama-3.1-8B-Instruct | 0.6297| 0.7197 | 0.7160 | 0.7140 | **0.7166** |
56-
| Qwen/Qwen2.5-7B | 0.6207| 0.7344 | 0.7269 | 0.7366 | **0.7408** |
57-
| Qwen/Qwen2.5-7B-Instruct | 0.6401| 0.7305 | 0.7234 | 0.7356 | **0.7382** |
58-
| mistralai/Mistral-7B-v0.3 | 0.6209| 0.7208 | 0.7115 | 0.7164 | **0.7291** |
59-
| Qwen/Qwen2.5-3B-Instruct | 0.5814| 0.7003 | 0.6839 | 0.6916 | **0.6966** |
60-
| meta-llama/Llama-3.2-3B-Instruct | 0.5435| 0.6515 | 0.6503 | 0.6510 | **0.6570** |
61-
| HuggingFaceTB/SmolLM-1.7B-Instruct | 0.4934| 0.5759 | 0.5751 | **0.5765** | 0.5733 |
62-
| google/gemma-2-2b-it | 0.6133| 0.6806 | 0.6658 | 0.6801 | **0.6843** |
69+
| meta-llama/Meta-Llama-3-8B | 0.6233| 0.7277 | 0.7167 | 0.7286 | **0.7344** |
70+
| meta-llama/Meta-Llama-3-8B-Instruct | 0.6286| 0.7148 | 0.7098 | 0.7116 | **0.7160** |
71+
| meta-llama/Llama-3.1-8B | 0.6310| 0.7330 | 0.7201 | 0.7216 | **0.7306** |
72+
| meta-llama/Llama-3.1-8B-Instruct | 0.6297| 0.7197 | 0.7160 | 0.7152 | **0.7183** |
73+
| Qwen/Qwen2.5-7B | 0.6207| 0.7344 | 0.7269 | 0.7317 | **0.7369** |
74+
| Qwen/Qwen2.5-7B-Instruct | 0.6401| 0.7305 | 0.7234 | 0.7301 | **0.7380** |
75+
| mistralai/Mistral-7B-v0.3 | 0.6209| 0.7208 | 0.7115 | 0.7219 | **0.7246** |
76+
| Qwen/Qwen2.5-3B-Instruct | 0.5814| 0.7003 | 0.6839 | 0.6914 | **0.6940** |
77+
| meta-llama/Llama-3.2-3B-Instruct | 0.5435| 0.6515 | 0.6503 | 0.6564 | **0.6612** |
78+
| HuggingFaceTB/SmolLM-1.7B-Instruct | 0.4934| 0.5759 | **0.5751** | 0.5714 | 0.5695 |
79+
| google/gemma-2-2b-it | 0.6133| 0.6806 | 0.6658 | 0.6721 | **0.6768** |
6380

6481
## Citation
6582

0 commit comments

Comments
 (0)