-
Couldn't load subscription status.
- Fork 6.4k
Description
Describe the bug
Hi @sayakpaul and all others :)
The training script for a Control-net based on Stable Diffusion 3 seems to not work.
RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead
I tried to follow the documentation on how to train a control net based on SD3.
I used a custom dataset that I also used to train a control net based on SD1.5.
Once i run the script. I receive a tensors channel do not match error.
Reproduction
!accelerate launch train_controlnet_sd3.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers"
--output_dir="/home/xxx/models/v1/cn-stablediff-v3_out"
--dataset_name="StudentYannik/v1-prepared-cn"
--resolution=512
--learning_rate=1e-5
--max_train_steps=10000
--train_batch_size=4
--num_train_epochs=10
--gradient_accumulation_steps=4
Logs
11/29/2024 14:35:32 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_image_seq_len', 'base_shift', 'max_image_seq_len', 'use_beta_sigmas', 'invert_sigmas', 'use_karras_sigmas', 'use_dynamic_shifting', 'max_shift', 'use_exponential_sigmas'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|ββββββββββββββββββββββ| 2/2 [00:00<00:00, 12539.03it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 2/2 [00:09<00:00, 4.92s/it]
{'mid_block_add_attention'} was not found in config. Values will be initialized to default values.
{'dual_attention_layers', 'qk_norm'} was not found in config. Values will be initialized to default values.
11/29/2024 14:35:54 - INFO - __main__ - Initializing controlnet weights from transformer
{'dual_attention_layers', 'pos_embed_type', 'qk_norm', 'use_pos_embed', 'force_zeros_for_pooled_projection'} was not found in config. Values will be initialized to default values.
11/29/2024 14:36:14 - INFO - __main__ - ***** Running training *****
11/29/2024 14:36:14 - INFO - __main__ - Num examples = 150
11/29/2024 14:36:14 - INFO - __main__ - Num batches each epoch = 38
11/29/2024 14:36:14 - INFO - __main__ - Num Epochs = 1000
11/29/2024 14:36:14 - INFO - __main__ - Instantaneous batch size per device = 4
11/29/2024 14:36:14 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
11/29/2024 14:36:14 - INFO - __main__ - Gradient Accumulation steps = 4
11/29/2024 14:36:14 - INFO - __main__ - Total optimization steps = 10000
Steps: 0%| | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1412, in <module>
main(args)
File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1278, in main
control_block_res_samples = controlnet(
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/controlnets/controlnet_sd3.py", line 365, in forward
hidden_states = hidden_states + self.pos_embed_input(controlnet_cond)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/embeddings.py", line 266, in forward
latent = self.proj(latent)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead
Steps: 0%| | 0/10000 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/xxxx/repos/control-net/.venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/xxxx/repos/control-net/.venv/bin/python', '/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=/home/xxxx/models/v1/cn-stablediff-v3_out', '--dataset_name=StudentYannik/v1-prepared-cn', '--resolution=512', '--learning_rate=1e-5', '--max_train_steps=10000', '--validation_image', '/home/xxxx/datasets/v1-raw-blender-valid/cube_52.png', '--validation_prompt', "{'prompt': 'A SMALL BLUE CUBE with background color WHITE', 'objects': [{'form': {'type': 'CUBE', 'color': 'BLUE', 'size': 'SMALL'}, 'position': {'x': 11, 'y': 13, 'z': 0}}]}", '--train_batch_size=4', '--num_train_epochs=10', '--gradient_accumulation_steps=4']' returned non-zero exit status 1.System Info
diffusion version: commit c96bfa5
python: python3.10
cuda: 12.2