Skip to content

Exception: Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead #348

@neph1

Description

@neph1

System Info / 系統信息

I've just tried out the v0.1.0 with both ptd and accelerate. Both throw exceptions after about an epoch (62 samples).

Training steps: 21%|██▏ | 64/300 [04:09<07:50, 1.99s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps: 22%|██▏ | 65/300 [04:11<07:46, 1.98s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps: 22%|██▏ | 65/300 [04:11<07:46, 1.98s/it] 2025-03-24 20:50:21,282 - finetrainers - ERROR - Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,283 - finetrainers - ERROR - An error occurred during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,292 - finetrainers - ERROR - Traceback (most recent call last):
File "train.py", line 70, in main
trainer.run()
File "finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
raise e
File "finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
self._train()
File "finetrainers/trainer/sft_trainer/trainer.py", line 473, in _train
pred, target, sigmas = self.model_specification.forward(
File "finetrainers/models/hunyuan_video/base_specification.py", line 317, in forward
pred = transformer(
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward
return model_forward(*args, **kwargs)
File "/python3.10/site-packages/accelerate/utils/operations.py", line 807, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 720, in forward
hidden_states = self.x_embedder(hidden_states)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 154, in forward
hidden_states = self.proj(hidden_states)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/diffusers/src/diffusers/hooks/hooks.py", line 148, in new_forward
output = function_reference.forward(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
return F.conv3d(
RuntimeError: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead

I'm currently on diffusers commit e7e6d852822b279b88f133395bcc2dd056eb59da
v0.0.1 still works
Let me know if this is a configuration issue
I'm using the ui, but you can see the command it generates below.

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

accelerate launch --config_file /accelerate_configs/uncompiled_1.yaml --gpu_ids 0 /train.py --parallel_backend accelerate --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none

torchrun --standalone --nnodes 1 --nproc_per_node 1 --rdzv_backend c10d --rdzv_endpoint localhost:0 /train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none

Expected behavior / 期待表现

.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions