Difference between self-attention and cross-attention in diffusion model unet #8555
Ahmad-Omar-Ahsan
started this conversation in
General
Replies: 1 comment 1 reply
-
Hi, I haven't worked with this exact implementation. But generally, if you only have a few discrete labels, self-attention is usually fine, the model will learn to condition on those. Cross-attention is really good when your conditioning input has more structure (clinical features, text, etc.), since it lets the network focus dynamically instead of treating the label as a simple embedding. For Hope this helps :) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I am training a 2D conditional diffusion model on different labels. At the moment, I am only changing the number of classes parameter in the U-Net. I noticed that there is a
context-embed
argument, which goes along with thewith_conditioning
argument. Going through the code, it looks like if with_conditioning is set to True, then it calls cross-attention; otherwise, it calls self-attention.Which would be better, cross-attention or self-attention? Secondly, if I decide to use cross-attention, what should the size of my context embedding be?
Beta Was this translation helpful? Give feedback.
All reactions