This repository was archived by the owner on Dec 24, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Implementation of Mutual Self Attention #31
Copy link
Copy link
Open
Description
Hello I have been reading your paper and using it as a baseline to develop specific editing algorithms for specific tasks. Fantastic paper btw
I have noticed that in #22 the issue of inconsistency between the description and implementation of mutual self attention, specifically in regards to the value vectors, was mentioned.
Lines 246 to 251 in eaac91e
| qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]]) | |
| qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]]) | |
| ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]]) | |
| kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]]) | |
| vu=torch.cat([vu[:num_heads*2],vu[:num_heads]]) | |
| vc=torch.cat([vc[:num_heads*2],vc[:num_heads]]) |
However, unless I'm mistaken, it is not adressed or updated
assuming the attention vectors are split into [src-tgt-layout] wouldn't a more correct implementation of algorithm 3, during the self-edit step, be `
qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]])
qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]])
ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]])
kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]])
vu=torch.cat([vu[:num_heads],vu[:num_heads],vc[:num_heads]])
vc=torch.cat([vc[:num_heads],vc[:num_heads],vc[:num_heads]])`
Can you please clarify my confusion ?
Thank you
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels