Skip to content

Potential issue with vae.encode input shape in cogvideox_image_to_video_lora.py #45

@chenyirui

Description

@chenyirui

I encountered an issue related to the input shape of vae.encode in the file cogvideox_image_to_video_lora.py at line 659.

Currently, the code looks like this:

noisy_images = images + torch.randn_like(images) * image_noise_sigma[:, None, None, None, None]
image_latent_dist = vae.encode(noisy_images).latent_dist

However, this results in a shape mismatch error when passing noisy_images to the VAE. I believe the correct shape for the input should be [B, C, F, H, W] instead of the current form. The modification I made to resolve the issue is as follows:

noisy_images = images + torch.randn_like(images) * image_noise_sigma[:, None, None, None, None]  # [B, F, C, H, W]
noisy_images = noisy_images.permute(0, 2, 1, 3, 4)  # [B, C, F, H, W]

Without this change, the following error occurs:

RuntimeError: torch.cat(): expected a non-empty list of Tensors
  File "path/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1224, in encode
    h = self._encode(x)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1181, in _encode
    return self.tiled_encode(x)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1357, in tiled_encode
    row.append(torch.cat(time, dim=2))

Additionally, I did not run prepare_dataset.py before training. I wanted to confirm if skipping this step could also be contributing to the issue, or if my proposed shape transformation is the correct fix.

Any guidance or confirmation would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions