DreamBooth "Something went wrong" - CUDA out of memory Error, please help #2001
Mandingo333
started this conversation in
General
Replies: 2 comments 18 replies
-
are you using the latest notebook ? |
Beta Was this translation helpful? Give feedback.
7 replies
-
xformers still disabled? |
Beta Was this translation helpful? Give feedback.
11 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all. Been getting this error today. Never had a problem before (although last time I trained on Dream Booth, it was 2 months ago). Any idea how to fix it? Thanks
0% 0/3000 [00:00<?, ?it/s] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 690, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py", line 632, in forward
sample = upsample_block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 1821, in forward
hidden_states = upsampler(hidden_states, upsample_size)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py", line 142, in forward
hidden_states = self.conv(hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 14.75 GiB total capacity; 13.26 GiB already allocated; 10.81 MiB free; 13.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0% 0/3000 [00:09<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions//instance_images', '--output_dir=/content/models/', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions//captions', '--instance_prompt=', '--seed=412293', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1.
Something went wrong
Beta Was this translation helpful? Give feedback.
All reactions