Exception: Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead

### System Info / 系統信息

I've just tried out the v0.1.0 with both ptd and accelerate. Both throw exceptions after about an epoch (62 samples).

Training steps:  21%|██▏       | 64/300 [04:09<07:50,  1.99s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps:  22%|██▏       | 65/300 [04:11<07:46,  1.98s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps:  22%|██▏       | 65/300 [04:11<07:46,  1.98s/it]                                                                 2025-03-24 20:50:21,282 - finetrainers - ERROR - Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,283 - finetrainers - ERROR - An error occurred during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,292 - finetrainers - ERROR - Traceback (most recent call last):
  File "train.py", line 70, in main
    trainer.run()
  File "finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
    raise e
  File "finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
    self._train()
  File "finetrainers/trainer/sft_trainer/trainer.py", line 473, in _train
    pred, target, sigmas = self.model_specification.forward(
  File "finetrainers/models/hunyuan_video/base_specification.py", line 317, in forward
    pred = transformer(
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward
    return model_forward(*args, **kwargs)
  File "/python3.10/site-packages/accelerate/utils/operations.py", line 807, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 720, in forward
    hidden_states = self.x_embedder(hidden_states)
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 154, in forward
    hidden_states = self.proj(hidden_states)
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/diffusers/src/diffusers/hooks/hooks.py", line 148, in new_forward
    output = function_reference.forward(*args, **kwargs)
  File "/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead

I'm currently on diffusers commit e7e6d852822b279b88f133395bcc2dd056eb59da
v0.0.1 still works
Let me know if this is a configuration issue
I'm using the ui, but you can see the command it generates below.

### Information / 问题信息

- [ ] The official example scripts / 官方的示例脚本
- [x] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程

accelerate launch --config_file /accelerate_configs/uncompiled_1.yaml --gpu_ids 0 /train.py --parallel_backend accelerate --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none

torchrun --standalone --nnodes 1 --nproc_per_node 1 --rdzv_backend c10d --rdzv_endpoint localhost:0 /train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none


### Expected behavior / 期待表现

 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead #348

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exception: Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead #348

Description

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions