-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Description
Update: Previously it was reported that the OOM was only for BNB, but now it is observed for Quantized Peft in general even for GPTQ. See #106
Previous description below describing issue only for BNB
BNB experiments run out of memory in new benchmarks that set lora_dropout=0.1.
| Benchmark | framework_config | peft_method | model_name_or_path | num_gpus | per_device_train_batch_size | lora dropout | Peak Memory in Bytes |
|---|---|---|---|---|---|---|---|
| Reference | accelerated-peft-bnb | lora | NousResearch/Llama-2-70b-hf | 2 | 4 | 0. | 72.39 |
| New | accelerated-peft-bnb | lora | NousResearch/Llama-2-70b-hf | 2 | 4 | 0.1 | 0. |
Compared to AutoGPTQ, we don't notice this issue
| Benchmark | framework_config | peft_method | model_name_or_path | num_gpus | per_device_train_batch_size | lora dropout | Peak Memory in Bytes |
|---|---|---|---|---|---|---|---|
| Reference | accelerated-peft-autogptq | lora | NousResearch/Llama-2-70b-hf | 2 | 4 | 0. | 70.14 |
| New | accelerated-peft-autogptq | lora | NousResearch/Llama-2-70b-hf | 2 | 4 | 0.1 | 71.7 |
There might be a slight overhead in the dropout implementation that causes the experiment to run out of memory for large models
Reproduce Issue
Lora Dropout=0. enters training
export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0. --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False
Lora Dropout=0.1 runs out of memory
export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested
