Purpose of grid reshaping in 2D sinusoidal positional embeddings #11203
Unanswered
jinhong-ni
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I'm a bit confused about this line of code (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L282).
Specifically,
grid_sizeis the tuple consisting of the heightHand widthWof the image.gridcomputed in L280 should have the shape2*H*W, and L282 reshapes it into2*1*W*H. The dimensionsW*Hwill be later flattened to match the dimensions of the latent.However, if you continue to
PatchEmbed(https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L549), you will notice that the latent with shapeBCHWis flattened intoB(H*W)C, this flattening operation does not seem to match withgridin L282. Does this reordering mess up with the ordering of dimensions when being flattened in caseHandWare not equal?I appreciate all responses and assistance from you in advance.
Beta Was this translation helpful? Give feedback.
All reactions