Skip to content

Conversation

@Dewei-Wang-sh
Copy link
Contributor

No description provided.

@Dewei-Wang-sh Dewei-Wang-sh linked an issue Oct 21, 2024 that may be closed by this pull request
@Dewei-Wang-sh Dewei-Wang-sh changed the title enable autotune for all attn adjust best config for attn Oct 21, 2024
@Dewei-Wang-sh Dewei-Wang-sh self-assigned this Oct 21, 2024
@Dewei-Wang-sh
Copy link
Contributor Author

Dewei-Wang-sh commented Oct 22, 2024

<style> </style>
N H N_CTX Head_Dim Causal block_N=32 block_N=64
1 16 16384 128 0 91.14% 100%
2 16 8192 128 0 90.17% 106%
4 16 4096 128 0 92.72% 97%
8 16 2048 128 0 84.78% 92%
16 16 1024 128 0 80.09% 88%
32 16 512 128 0 64.08% 67%
Geomean         83.19% 90.73%

@Dewei-Wang-sh Dewei-Wang-sh merged commit 24e53d2 into intel:main Oct 22, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the auto_attn branch October 22, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FA2 performance] flashattention with dim=128 get ~90% of xetla

3 participants