Potential issue with vae.encode input shape in cogvideox_image_to_video_lora.py

I encountered an issue related to the input shape of vae.encode in the file cogvideox_image_to_video_lora.py at line 659.

Currently, the code looks like this:

```python
noisy_images = images + torch.randn_like(images) * image_noise_sigma[:, None, None, None, None]
image_latent_dist = vae.encode(noisy_images).latent_dist
```

However, this results in a shape mismatch error when passing noisy_images to the VAE. I believe the correct shape for the input should be [B, C, F, H, W] instead of the current form. The modification I made to resolve the issue is as follows:
```python
noisy_images = images + torch.randn_like(images) * image_noise_sigma[:, None, None, None, None]  # [B, F, C, H, W]
noisy_images = noisy_images.permute(0, 2, 1, 3, 4)  # [B, C, F, H, W]
```
Without this change, the following error occurs:
```python
RuntimeError: torch.cat(): expected a non-empty list of Tensors
  File "path/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1224, in encode
    h = self._encode(x)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1181, in _encode
    return self.tiled_encode(x)
  File "path/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1357, in tiled_encode
    row.append(torch.cat(time, dim=2))
```
Additionally, I did not run prepare_dataset.py before training. I wanted to confirm if skipping this step could also be contributing to the issue, or if my proposed shape transformation is the correct fix.

Any guidance or confirmation would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential issue with vae.encode input shape in cogvideox_image_to_video_lora.py #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential issue with vae.encode input shape in cogvideox_image_to_video_lora.py #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions