-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
ZeroDivisionError when performing forward pass with UNet3DConditionModel
I'm encountering a ZeroDivisionError when attempting to perform a forward pass with the UNet3DConditionModel. This seems to be related to the num_attention_heads parameter being None, which causes self.inner_dim to be 0.
Here's the code I'm using:
from diffusers import UNet3DConditionModel
import torch
model = UNet3DConditionModel(
down_block_types=(
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"DownBlock3D",
),
up_block_types=(
"UpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D",
),
block_out_channels=(32, 64, 128, 128),
norm_num_groups=4,
)
data = torch.randn(1, 4, 32, 32, 32)
model(data, timestep=3, encoder_hidden_states=torch.zeros(1, 4, 32, 32, 32))The error traceback indicates that the issue occurs in the attention processing:
ZeroDivisionError: integer division or modulo by zero
This seems to be because num_attention_heads is None, leading to self.inner_dim = 0 in the transformer configuration.
I noticed that in the UNet3DConditionModel implementation, there's a check that raises an error if num_attention_heads is provided:
if num_attention_heads is not None:
raise NotImplementedError(
"At the moment it is not possible to define the number of attention heads via num_attention_heads because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131 . Passing num_attention_heads will only be supported in diffusers v0.19."
)Given this limitation, I'm unsure how to properly configure the model to avoid this error. Could you provide guidance on:
- How to correctly perform a forward pass with demo hidden states
- What parameters I should adjust to ensure the model is properly configured
- If there's a workaround for this issue in the current version of diffusers
Thank you for your assistance!
Reproduction
from diffusers import UNet3DConditionModel
import torch
model = UNet3DConditionModel(
down_block_types=(
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"DownBlock3D",
),
up_block_types=(
"UpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D",
),
block_out_channels=(32, 64, 128, 128),
norm_num_groups=4,
)
data = torch.randn(1, 4, 32, 32, 32)
model(data, timestep=3, encoder_hidden_states=torch.zeros(1, 4, 32, 32, 32))Logs
System Info
Python 3.11.10
diffusers version 0.32.2
ubuntu 24.04
Who can help?
No response