DreamBooth "Something went wrong" - CUDA out of memory Error, please help #2001

Mandingo333 · 2023-04-16T21:58:57Z

Mandingo333
Apr 16, 2023

Hi all. Been getting this error today. Never had a problem before (although last time I trained on Dream Booth, it was 2 months ago). Any idea how to fix it? Thanks

0% 0/3000 [00:00<?, ?it/s] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 690, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py", line 632, in forward
sample = upsample_block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 1821, in forward
hidden_states = upsampler(hidden_states, upsample_size)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py", line 142, in forward
hidden_states = self.conv(hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 14.75 GiB total capacity; 13.26 GiB already allocated; 10.81 MiB free; 13.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0% 0/3000 [00:09<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions//instance_images', '--output_dir=/content/models/', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions//captions', '--instance_prompt=', '--seed=412293', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1.
Something went wrong

TheLastBen · 2023-04-16T22:07:33Z

TheLastBen
Apr 16, 2023
Maintainer

are you using the latest notebook ?

7 replies

Mandingo333 Apr 17, 2023
Author

Thanks for getting back to me. I just tried your link. I'm getting the same exact error.

TheLastBen Apr 17, 2023
Maintainer

did you crop the images or did you at least check the box "smart_crop" ?

Mandingo333 Apr 17, 2023
Author

images have been cropped, yes.

TheLastBen Apr 18, 2023
Maintainer

still, check the box "smart_crop" and upload them again

Mandingo333 Apr 22, 2023
Author

Hey I clicked on the smart crop and it worked like a charm. Thank you! You're truly the best.

sangoi-exe · 2023-04-22T14:29:46Z

sangoi-exe
Apr 22, 2023

xformers still disabled?

11 replies

sangoi-exe Apr 22, 2023

luckily I have the last attempt log here, left the tab open

Progress:| | 0% 1/300 [00:08<42:58, 8.62s/it, loss=0.00475, lr=1.99e-6] ntzy Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in <module> main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 690, in main model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py", line 507, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py", line 632, in forward sample = upsample_block( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 1813, in forward hidden_states = attn( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/transformer_2d.py", line 265, in forward hidden_states = block( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 324, in forward ff_output = self.ff(norm_hidden_states) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 382, in forward hidden_states = module(hidden_states) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 429, in forward return hidden_states * self.gelu(gate) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 423, in gelu return F.gelu(gate) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 15.77 GiB total capacity; 14.11 GiB already allocated; 10.12 MiB free; 14.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Progress:| | 0% 1/300 [00:10<52:21, 10.51s/it, loss=0.00475, lr=1.99e-6] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=500', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ntzy_v1', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ntzy_v1/instance_images', '--output_dir=/content/models/ntzy_v1', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ntzy_v1/captions', '--instance_prompt=', '--seed=105452', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=300']' returned non-zero exit status 1. Something went wrong

TheLastBen Apr 22, 2023
Maintainer

You're not using the latest notebook, I always insist on using the latest one using the official link https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

sangoi-exe Apr 22, 2023

ah, but at that moment was the latest one haha I had just got it from main branch, last commit 😂 this log is from yesterday

TheLastBen Apr 23, 2023
Maintainer

does it work now ?

sangoi-exe Apr 23, 2023

yep! doing it right now, thanks!!

DreamBooth "Something went wrong" - CUDA out of memory Error, please help #2001

Uh oh!

Mandingo333 Apr 16, 2023

Replies: 2 comments · 18 replies

Uh oh!

TheLastBen Apr 16, 2023 Maintainer

Uh oh!

Mandingo333 Apr 17, 2023 Author

Uh oh!

TheLastBen Apr 17, 2023 Maintainer

Uh oh!

Mandingo333 Apr 17, 2023 Author

Uh oh!

TheLastBen Apr 18, 2023 Maintainer

Uh oh!

Mandingo333 Apr 22, 2023 Author

Uh oh!

sangoi-exe Apr 22, 2023

Uh oh!

sangoi-exe Apr 22, 2023

Uh oh!

TheLastBen Apr 22, 2023 Maintainer

Uh oh!

sangoi-exe Apr 22, 2023

Uh oh!

TheLastBen Apr 23, 2023 Maintainer

Uh oh!

sangoi-exe Apr 23, 2023

Mandingo333
Apr 16, 2023

Replies: 2 comments 18 replies

TheLastBen
Apr 16, 2023
Maintainer

Mandingo333 Apr 17, 2023
Author

TheLastBen Apr 17, 2023
Maintainer

Mandingo333 Apr 17, 2023
Author

TheLastBen Apr 18, 2023
Maintainer

Mandingo333 Apr 22, 2023
Author

sangoi-exe
Apr 22, 2023

TheLastBen Apr 22, 2023
Maintainer

TheLastBen Apr 23, 2023
Maintainer