Skip to content

Training script for a Controlnet based on SD3 does not workΒ #10055

@Putzzmunta

Description

@Putzzmunta

Describe the bug

Hi @sayakpaul and all others :)

The training script for a Control-net based on Stable Diffusion 3 seems to not work.

RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead

I tried to follow the documentation on how to train a control net based on SD3.
I used a custom dataset that I also used to train a control net based on SD1.5.

Once i run the script. I receive a tensors channel do not match error.

Reproduction

!accelerate launch train_controlnet_sd3.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers"
--output_dir="/home/xxx/models/v1/cn-stablediff-v3_out"
--dataset_name="StudentYannik/v1-prepared-cn"
--resolution=512
--learning_rate=1e-5
--max_train_steps=10000
--train_batch_size=4
--num_train_epochs=10
--gradient_accumulation_steps=4

Logs

11/29/2024 14:35:32 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_image_seq_len', 'base_shift', 'max_image_seq_len', 'use_beta_sigmas', 'invert_sigmas', 'use_karras_sigmas', 'use_dynamic_shifting', 'max_shift', 'use_exponential_sigmas'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 12539.03it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:09<00:00,  4.92s/it]
{'mid_block_add_attention'} was not found in config. Values will be initialized to default values.
{'dual_attention_layers', 'qk_norm'} was not found in config. Values will be initialized to default values.
11/29/2024 14:35:54 - INFO - __main__ - Initializing controlnet weights from transformer
{'dual_attention_layers', 'pos_embed_type', 'qk_norm', 'use_pos_embed', 'force_zeros_for_pooled_projection'} was not found in config. Values will be initialized to default values.
11/29/2024 14:36:14 - INFO - __main__ - ***** Running training *****
11/29/2024 14:36:14 - INFO - __main__ -   Num examples = 150
11/29/2024 14:36:14 - INFO - __main__ -   Num batches each epoch = 38
11/29/2024 14:36:14 - INFO - __main__ -   Num Epochs = 1000
11/29/2024 14:36:14 - INFO - __main__ -   Instantaneous batch size per device = 4
11/29/2024 14:36:14 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
11/29/2024 14:36:14 - INFO - __main__ -   Gradient Accumulation steps = 4
11/29/2024 14:36:14 - INFO - __main__ -   Total optimization steps = 10000
Steps:   0%|                                          | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1412, in <module>
    main(args)
  File "/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1278, in main
    control_block_res_samples = controlnet(
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/controlnets/controlnet_sd3.py", line 365, in forward
    hidden_states = hidden_states + self.pos_embed_input(controlnet_cond)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/diffusers/src/diffusers/models/embeddings.py", line 266, in forward
    latent = self.proj(latent)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Given groups=1, weight of size [1536, 17, 2, 2], expected input[4, 16, 64, 64] to have 17 channels, but got 16 channels instead
Steps:   0%|                                          | 0/10000 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/home/xxxx/repos/control-net/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/home/xxxx/repos/control-net/.venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/xxxx/repos/control-net/.venv/bin/python', '/home/xxxx/repos/control-net/diffusers/examples/controlnet/train_controlnet_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=/home/xxxx/models/v1/cn-stablediff-v3_out', '--dataset_name=StudentYannik/v1-prepared-cn', '--resolution=512', '--learning_rate=1e-5', '--max_train_steps=10000', '--validation_image', '/home/xxxx/datasets/v1-raw-blender-valid/cube_52.png', '--validation_prompt', "{'prompt': 'A SMALL BLUE CUBE with background color WHITE', 'objects': [{'form': {'type': 'CUBE', 'color': 'BLUE', 'size': 'SMALL'}, 'position': {'x': 11, 'y': 13, 'z': 0}}]}", '--train_batch_size=4', '--num_train_epochs=10', '--gradient_accumulation_steps=4']' returned non-zero exit status 1.

System Info

diffusion version: commit c96bfa5
python: python3.10
cuda: 12.2

Who can help?

@sayakpaul , @yiyixuxu, @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions