Training script for a Controlnet based on SD3 does not work

### Describe the bug

Hi @sayakpaul and all others :)


The training script for a Control-net based on Stable Diffusion 3 seems to not work.

**RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead**



I tried to follow the documentation on how to train a control net based on SD3.
I used a custom dataset that I also used to train a control net based on SD1.5. 

Once i run the script. I receive a tensors channel do not match error.


### Reproduction

!accelerate launch train_controlnet_sd3.py \
 --pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers" \
 --output_dir="/home/xxx/models/v1/cn-stablediff-v3_out" \
 --dataset_name="StudentYannik/v1-prepared-cn" \
 --resolution=512 \
 --learning_rate=1e-5 \
 --max_train_steps=10000 \
 --train_batch_size=4 \
 --num_train_epochs=10 \
 --gradient_accumulation_steps=4

### Logs

```shell
11/29/2024 14:35:32 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_image_seq_len', 'base_shift', 'max_image_seq_len', 'use_beta_sigmas', 'invert_sigmas', 'use_karras_sigmas', 'use_dynamic_shifting', 'max_shift', 'use_exponential_sigmas'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|██████████████████████| 2/2 [00:00<00:00, 12539.03it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:09<00:00,  4.92s/it]
{'mid_block_add_attention'} was not found in config. Values will be initialized to default values.
{'dual_attention_layers', 'qk_norm'} was not found in config. Values will be initialized to default values.
11/29/2024 14:35:54 - INFO - __main__ - Initializing controlnet weights from transformer
{'dual_attention_layers', 'pos_embed_type', 'qk_norm', 'use_pos_embed', 'force_zeros_for_pooled_projection'} was not found in config. Values will be initialized to default values.
11/29/2024 14:36:14 - INFO - __main__ - ***** Running training *****
11/29/2024 14:36:14 - INFO - __main__ -   Num examples = 150
11/29/2024 14:36:14 - INFO - __main__ -   Num batches each epoch = 38
11/29/2024 14:36:14 - INFO - __main__ -   Num Epochs = 1000
11/29/2024 14:36:14 - INFO - __main__ -   Instantaneous batch size per device = 4
11/29/2024 14:36:14 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
11/29/2024 14:36:14 - INFO - __main__ -   Gradient Accumulation steps = 4
11/29/2024 14:36:14 - INFO - __main__ -   Total optimization steps = 10000
Steps:   0%|                                          | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1412, in <module>
    main(args)
  File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1278, in main
    control_block_res_samples = controlnet(
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/controlnets/controlnet_sd3.py", line 365, in forward
    hidden_states = hidden_states + self.pos_embed_input(controlnet_cond)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/embeddings.py", line 266, in forward
    latent = self.proj(latent)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead
Steps:   0%|                                          | 0/10000 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/home/xxxx/repos/control-net/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/xxxx/repos/control-net/.venv/bin/python', '/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=/home/xxxx/models/v1/cn-stablediff-v3_out', '--dataset_name=StudentYannik/v1-prepared-cn', '--resolution=512', '--learning_rate=1e-5', '--max_train_steps=10000', '--validation_image', '/home/xxxx/datasets/v1-raw-blender-valid/cube_52.png', '--validation_prompt', "{'prompt': 'A SMALL BLUE CUBE with background color WHITE', 'objects': [{'form': {'type': 'CUBE', 'color': 'BLUE', 'size': 'SMALL'}, 'position': {'x': 11, 'y': 13, 'z': 0}}]}", '--train_batch_size=4', '--num_train_epochs=10', '--gradient_accumulation_steps=4']' returned non-zero exit status 1.
```


### System Info

diffusion version: commit c96bfa5c80eca798d555a79a491043c311d0f608
python: python3.10
cuda: 12.2

### Who can help?

@sayakpaul , @yiyixuxu, @DN6 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training script for a Controlnet based on SD3 does not work #10055

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Training script for a Controlnet based on SD3 does not work #10055

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions