Skip to content

Conversation

@a-r-r-o-w
Copy link
Contributor

The following error is raised on Hopper GPUs if the tensors are non-contiguous:

traceback
ERROR:finetrainers:An error occurred during training: Unsupported memory format for group normalization: ChannelsLast3d
ERROR:finetrainers:Traceback (most recent call last):
  File "/fsx/aryan/finetrainers/train.py", line 34, in main
    trainer.train()
  File "/fsx/aryan/finetrainers/finetrainers/trainer.py", line 375, in train
    latent_conditions = self.model_config["prepare_latents"](
  File "/fsx/aryan/finetrainers/finetrainers/hunyuan_video/hunyuan_video_lora.py", line 168, in prepare_latents
    latents = vae.encode(image_or_video).latent_dist.sample(generator=generator)
  File "/fsx/aryan/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 900, in encode
    h = self._encode(x)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 871, in _encode
    return self._temporal_tiled_encode(x)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 1098, in _temporal_tiled_encode
    tile = self.tiled_encode(tile)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 1007, in tiled_encode
    tile = self.encoder(tile)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 571, in forward
    hidden_states = self.mid_block(hidden_states)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 297, in forward
    hidden_states = resnet(hidden_states)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/aryan/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_hunyuan_video.py", line 173, in forward
    hidden_states = self.norm1(hidden_states)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 313, in forward
    return F.group_norm(input, self.num_groups, self.weight, self.bias, self.eps)
  File "/fsx/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2965, in group_norm
    return torch.group_norm(
RuntimeError: Unsupported memory format for group normalization: ChannelsLast3d

This does not seem to happen on some of the other GPUs I tested on (A100, L40, 4090) so I think it is hopper-specific

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu December 19, 2024 22:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w a-r-r-o-w merged commit 151b74c into main Dec 20, 2024
15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the contiguous-hunyuan-resnet branch December 20, 2024 06:15
Foundsheep pushed a commit to Foundsheep/diffusers that referenced this pull request Dec 23, 2024
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
contiguous tensors in resnet

Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants