How to replace the attention manchism in the transformconv with flashattn or causal attn? #7457
HelloWorldLTY
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
I tried but still got different output. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I think there should be approaches to optimize the attention calculation in TransformConv codes. How to replace it with multihead attention provided by pytorch with acclecration of flash attn or functional approach like causal attn? Thank a lot.
Beta Was this translation helpful? Give feedback.
All reactions