Skip to content

train_text_to_image_lora.py raise ValueError("Attempting to unscale FP16 gradients.") #6363

@billvsme

Description

@billvsme

Describe the bug

When looking at the examples/text_to_image documentation, I experimented with the train_text_to_image_lora.py following the examples in the documentation. But I found that the run with raise ValueError("Attempting to unscale FP16 gradients.") error.

I found that the cause of the error may be related to this code. Here use args.mixed_precision to determine whether to convert Lora's parameters to float32, but args.mixed_precision default value is None, according to the example in README, the mixed_precision of accelerate is set, and it is not set args.mixed_ precision, so it causes "Attempting to unscale FP16 gradients." error.

if args.mixed_precision == "fp16":
for param in unet.parameters():
# only upcast trainable parameters (LoRA) into fp32
if param.requires_grad:
param.data = param.to(torch.float32)

It might be a better choice to change this to use accelerator.mixed_precision

Reproduction

cd diffusers/examples/text_to_image/

accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --dataset_name="lambdalabs/pokemon-blip-captions" --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=100 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora" \
  --validation_prompt="cute dragon creature"

Logs

Steps:   0%|                                          | 0/20900 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
Traceback (most recent call last):
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
    main()
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
    main()
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
Traceback (most recent call last):
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
    self.unscale_gradients()
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
    main()
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
    self.unscale_gradients()
    self.scaler.unscale_(opt)
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
    self.scaler.unscale_(opt)
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    self.unscale_gradients()
  File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
    raise ValueError("Attempting to unscale FP16 gradients.")

System Info

  • diffusers version: 0.25.0.dev0
  • Platform: Linux-6.2.0-39-generic-x86_64-with-glibc2.35
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.1.2+cu121 (True)
  • Huggingface_hub version: 0.19.4
  • Transformers version: 4.36.2
  • Accelerate version: 0.25.0
  • xFormers version: 0.0.22.post7
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions