Replies: 2 comments 1 reply
-
+1 this would be super cool to know! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi! The heads were not specifically designed or constrained to be monotonically aligned, but some heads in the cross attention layers naturally learned to have the attention weights matching the time alignment. The |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
While reading the source code of Whisper, I noticed that models of different sizes all have a set of attention heads specifically designed for alignment.
I visualized the weight distributions of these attention heads during decoding by matplotlib, and found that they all exhibit good
monotonic alignment
properties.I'm wondering how researchers at OpenAI achieved this property for a specific set of attention heads.
Beta Was this translation helpful? Give feedback.
All reactions