-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
I encountered the following error when running train_text_to_image_lora.py: ValueError: Attempting to unscale FP16 gradients.
The script I am running is as follows:
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_NAME --caption_column="text"
--resolution=512 --random_flip
--train_batch_size=1
--num_train_epochs=100 --checkpointing_steps=5000
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0
--seed=42
--output_dir="sd-naruto-model-lora-clean"
--validation_prompt="cute dragon creature" --report_to="wandb"
How can I resolve this error?
Reproduction
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_NAME --caption_column="text"
--resolution=512 --random_flip
--train_batch_size=1
--num_train_epochs=100 --checkpointing_steps=5000
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0
--seed=42
--output_dir="sd-naruto-model-lora-clean"
--validation_prompt="cute dragon creature" --report_to="wandb"
Logs
System Info
Traceback (most recent call last):
File "train_text_to_image_lora.py", line 975, in
main()
File "train_text_to_image_lora.py", line 856, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 2396, in clip_grad_norm_
self.unscale_gradients()
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 2340, in unscale_gradients
self.scaler.unscale_(opt)
File "/root/miniconda3/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_
optimizer_state["found_inf_per_device"] = self.unscale_grads(
File "/root/miniconda3/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 260, in unscale_grads
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Who can help?
No response