Skip to content

ZeroDivisionError when performing forward pass with UNet3DConditionModel #11042

@txz32102

Description

@txz32102

Describe the bug

ZeroDivisionError when performing forward pass with UNet3DConditionModel

I'm encountering a ZeroDivisionError when attempting to perform a forward pass with the UNet3DConditionModel. This seems to be related to the num_attention_heads parameter being None, which causes self.inner_dim to be 0.

Here's the code I'm using:

from diffusers import UNet3DConditionModel
import torch

model = UNet3DConditionModel(
    down_block_types=(
        "CrossAttnDownBlock3D",
        "CrossAttnDownBlock3D",
        "CrossAttnDownBlock3D",
        "DownBlock3D",
    ),
    up_block_types=(
        "UpBlock3D",
        "CrossAttnUpBlock3D",
        "CrossAttnUpBlock3D",
        "CrossAttnUpBlock3D",
    ),
    block_out_channels=(32, 64, 128, 128),
    norm_num_groups=4,
)

data = torch.randn(1, 4, 32, 32, 32)

model(data, timestep=3, encoder_hidden_states=torch.zeros(1, 4, 32, 32, 32))

The error traceback indicates that the issue occurs in the attention processing:

ZeroDivisionError: integer division or modulo by zero

This seems to be because num_attention_heads is None, leading to self.inner_dim = 0 in the transformer configuration.

I noticed that in the UNet3DConditionModel implementation, there's a check that raises an error if num_attention_heads is provided:

if num_attention_heads is not None:
    raise NotImplementedError(
        "At the moment it is not possible to define the number of attention heads via num_attention_heads because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131 . Passing num_attention_heads will only be supported in diffusers v0.19."
    )

Given this limitation, I'm unsure how to properly configure the model to avoid this error. Could you provide guidance on:

  1. How to correctly perform a forward pass with demo hidden states
  2. What parameters I should adjust to ensure the model is properly configured
  3. If there's a workaround for this issue in the current version of diffusers

Thank you for your assistance!

Reproduction

from diffusers import UNet3DConditionModel
import torch

model = UNet3DConditionModel(
    down_block_types=(
        "CrossAttnDownBlock3D",
        "CrossAttnDownBlock3D",
        "CrossAttnDownBlock3D",
        "DownBlock3D",
    ),
    up_block_types=(
        "UpBlock3D",
        "CrossAttnUpBlock3D",
        "CrossAttnUpBlock3D",
        "CrossAttnUpBlock3D",
    ),
    block_out_channels=(32, 64, 128, 128),
    norm_num_groups=4,
)

data = torch.randn(1, 4, 32, 32, 32)

model(data, timestep=3, encoder_hidden_states=torch.zeros(1, 4, 32, 32, 32))

Logs

System Info

Python 3.11.10
diffusers version 0.32.2
ubuntu 24.04

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions