-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
When looking at the examples/text_to_image documentation, I experimented with the train_text_to_image_lora.py following the examples in the documentation. But I found that the run with raise ValueError("Attempting to unscale FP16 gradients.") error.
I found that the cause of the error may be related to this code. Here use args.mixed_precision to determine whether to convert Lora's parameters to float32, but args.mixed_precision default value is None, according to the example in README, the mixed_precision of accelerate is set, and it is not set args.mixed_ precision, so it causes "Attempting to unscale FP16 gradients." error.
diffusers/examples/text_to_image/train_text_to_image_lora.py
Lines 468 to 472 in 1fff527
| if args.mixed_precision == "fp16": | |
| for param in unet.parameters(): | |
| # only upcast trainable parameters (LoRA) into fp32 | |
| if param.requires_grad: | |
| param.data = param.to(torch.float32) |
It might be a better choice to change this to use accelerator.mixed_precision
Reproduction
cd diffusers/examples/text_to_image/
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--dataset_name="lambdalabs/pokemon-blip-captions" --caption_column="text" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir="sd-pokemon-model-lora" \
--validation_prompt="cute dragon creature"Logs
Steps: 0%| | 0/20900 [00:00<?, ?it/s]Traceback (most recent call last):
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
Traceback (most recent call last):
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
main()
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
main()
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
Traceback (most recent call last):
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 945, in <module>
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
self.unscale_gradients()
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
main()
File "/content/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 774, in main
self.unscale_gradients()
self.scaler.unscale_(opt)
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
self.scaler.unscale_(opt)
File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/home/billvsme/venv/train/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
self.unscale_gradients()
File "/home/billvsme/venv/train/lib/python3.10/site-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
raise ValueError("Attempting to unscale FP16 gradients.")System Info
diffusersversion: 0.25.0.dev0- Platform: Linux-6.2.0-39-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Huggingface_hub version: 0.19.4
- Transformers version: 4.36.2
- Accelerate version: 0.25.0
- xFormers version: 0.0.22.post7
- Using GPU in script?:
- Using distributed or parallel set-up in script?: