-
Notifications
You must be signed in to change notification settings - Fork 141
Description
System Info / 系統信息
I've just tried out the v0.1.0 with both ptd and accelerate. Both throw exceptions after about an epoch (62 samples).
Training steps: 21%|██▏ | 64/300 [04:09<07:50, 1.99s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps: 22%|██▏ | 65/300 [04:11<07:46, 1.98s/it, grad_norm=0.00579, global_avg_loss=0.338, global_max_loss=0.338]
Training steps: 22%|██▏ | 65/300 [04:11<07:46, 1.98s/it] 2025-03-24 20:50:21,282 - finetrainers - ERROR - Error during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,283 - finetrainers - ERROR - An error occurred during training: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
2025-03-24 20:50:21,292 - finetrainers - ERROR - Traceback (most recent call last):
File "train.py", line 70, in main
trainer.run()
File "finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
raise e
File "finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
self._train()
File "finetrainers/trainer/sft_trainer/trainer.py", line 473, in _train
pred, target, sigmas = self.model_specification.forward(
File "finetrainers/models/hunyuan_video/base_specification.py", line 317, in forward
pred = transformer(
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward
return model_forward(*args, **kwargs)
File "/python3.10/site-packages/accelerate/utils/operations.py", line 807, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 720, in forward
hidden_states = self.x_embedder(hidden_states)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py", line 154, in forward
hidden_states = self.proj(hidden_states)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/diffusers/src/diffusers/hooks/hooks.py", line 148, in new_forward
output = function_reference.forward(*args, **kwargs)
File "/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
return F.conv3d(
RuntimeError: Given groups=1, weight of size [3072, 16, 1, 2, 2], expected input[1, 8, 1, 56, 68] to have 16 channels, but got 8 channels instead
I'm currently on diffusers commit e7e6d852822b279b88f133395bcc2dd056eb59da
v0.0.1 still works
Let me know if this is a configuration issue
I'm using the ui, but you can see the command it generates below.
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
accelerate launch --config_file /accelerate_configs/uncompiled_1.yaml --gpu_ids 0 /train.py --parallel_backend accelerate --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none
torchrun --standalone --nnodes 1 --nproc_per_node 1 --rdzv_backend c10d --rdzv_endpoint localhost:0 /train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path models/hunyuan --text_encoder_dtype bf16 --text_encoder_2_dtype bf16 --text_encoder_3_dtype bf16 --transformer_dtype bf16 --vae_dtype bf16 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --layerwise_upcasting_skip_modules_pattern patch_embed pos_embed x_embedder context_embedder ^proj_in$ ^proj_out$ norm --dataset_config /dataset_config.json --caption_dropout_p 0.05 --caption_dropout_technique empty --enable_precomputation --precomputation_items 62 --precomputation_once --dataloader_num_workers 0 --training_type lora --seed 425 --batch_size 1 --train_steps 300 --rank 64 --lora_alpha 64 --target_modules to_q to_k to_v to_out.0 --gradient_accumulation_steps 8 --gradient_checkpointing --checkpointing_steps 100 --checkpointing_limit 5 --enable_slicing --enable_tiling --optimizer adamw --lr 0.0002 --lr_scheduler linear --lr_warmup_steps 100 --lr_num_cycles 1 --beta1 0.9 --beta2 0.95 --weight_decay 0.001 --epsilon 1e-8 --max_grad_norm 1 --num_validation_videos 0 --validation_steps 10000 --tracker_name finetrainers --output_dir output/ --nccl_timeout 1800 --report_to none
Expected behavior / 期待表现
.