Replies: 1 comment 1 reply
-
I assume you are talking about the vision transformers. The position embeddings are trainable position representations instead of hard-coded position indices. cc @ahatamiz |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I cann't understand what the point of adding position_embeddings to the sequence is. For image blocks, when 2d or 3D images are transformed into 1d sequence or 1d sequence is transformed into 2D or 3D images, the operation is carried out in strict order, and the sequence sequence is not disrupted later, so it seems that position embedding of the sequence is no longer needed. In addition, I didn't understand the calculation method of position_embeddings. Could you give me a general explanation?
Beta Was this translation helpful? Give feedback.
All reactions