Replies: 2 comments 1 reply
-
You also need to change the bounds of the loop to make sure each query iterates through all the keys |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ptillet Thank you for your response! I changed the for loop bound to
Thank you so much :) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am interested in implementing a version of flash attention in triton without masking.
I infer from the pytest in the tutorial that the flash attention example generate outputs that are masked with a triangular matrix. I also infer from the code that it would be this specific line inside the kernel that apply the triangular mask.
However, when i remove both of these operations, the outputs from triton and pytorch are not the same. Does anyone know how I can change the code to make this happen? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions