Purpose of grid reshaping in 2D sinusoidal positional embeddings #11203

jinhong-ni · 2025-04-03T14:17:27Z

jinhong-ni
Apr 3, 2025

Hi all,

I'm a bit confused about this line of code (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L282).

Specifically, grid_size is the tuple consisting of the height H and width W of the image. grid computed in L280 should have the shape 2*H*W, and L282 reshapes it into 2*1*W*H. The dimensions W*H will be later flattened to match the dimensions of the latent.

However, if you continue to PatchEmbed (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L549), you will notice that the latent with shape BCHW is flattened into B(H*W)C, this flattening operation does not seem to match with grid in L282. Does this reordering mess up with the ordering of dimensions when being flattened in case H and W are not equal?

I appreciate all responses and assistance from you in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Purpose of grid reshaping in 2D sinusoidal positional embeddings #11203

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Purpose of grid reshaping in 2D sinusoidal positional embeddings #11203

Uh oh!

jinhong-ni Apr 3, 2025

Replies: 0 comments

jinhong-ni
Apr 3, 2025