Skip to content

In Diffusion Models: Why does not use 'Positional Encoding' in self-attention layers? #4

@lionking6792

Description

@lionking6792

Thx for nice practicing about DM.
Actually, I'm really curious about why does not use 'Positional Encoding' (which was used in ViT or VanillaTransformer.. etc..) in self-attention layers?
Is that any reason and can we ensure self-attention in DDPM U-Net can maintain its position(pixel-wise) information?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions