-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Description
Hi Peter,
Thanks for this nice repo. When I ran in for the first time, the naive attention algorithm is indeed much slower. But on the second run, it was drastically faster than the flash attention kernel. I take it this is the performance after the system is warmed up.
Is it fair to say that naive attention on these small input sizes is faster than minimal flash attention? I think that would make sense intuitively since the gains from Flash Attention should come from long sequence lengths.

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels