Skip to content

ValueError: Attempting to unscale FP16 gradients. #10910

@Messimanda

Description

@Messimanda

Describe the bug

I encountered the following error when running train_text_to_image_lora.py: ValueError: Attempting to unscale FP16 gradients.

The script I am running is as follows:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/naruto-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_NAME --caption_column="text"
--resolution=512 --random_flip
--train_batch_size=1
--num_train_epochs=100 --checkpointing_steps=5000
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0
--seed=42
--output_dir="sd-naruto-model-lora-clean"
--validation_prompt="cute dragon creature" --report_to="wandb"
How can I resolve this error?

Reproduction

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/naruto-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_NAME --caption_column="text"
--resolution=512 --random_flip
--train_batch_size=1
--num_train_epochs=100 --checkpointing_steps=5000
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0
--seed=42
--output_dir="sd-naruto-model-lora-clean"
--validation_prompt="cute dragon creature" --report_to="wandb"

Logs

System Info

Traceback (most recent call last):
File "train_text_to_image_lora.py", line 975, in
main()
File "train_text_to_image_lora.py", line 856, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 2396, in clip_grad_norm_
self.unscale_gradients()
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 2340, in unscale_gradients
self.scaler.unscale_(opt)
File "/root/miniconda3/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_
optimizer_state["found_inf_per_device"] = self.unscale_grads(
File "/root/miniconda3/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 260, in unscale_grads
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions