Skip to content

Conversation

@quintinwang5
Copy link
Contributor

Squeeze Z H into the same axis as what XeTLA does. This change can have about 3% benefit for N_CTX = 512 shapes.

Copy link
Contributor

@Dewei-Wang-sh Dewei-Wang-sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@etiotto
Copy link
Contributor

etiotto commented Nov 4, 2024

Any performance impact for this change ?

@quintinwang5
Copy link
Contributor Author

Any performance impact for this change ?

image

@quintinwang5 quintinwang5 merged commit 99778f4 into main Nov 6, 2024
5 checks passed
@quintinwang5 quintinwang5 deleted the quintin/index_align branch November 6, 2024 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FA] Improve performance of shapes <95% on advanced path - 32x32x512, 4x32x4096, 2x32x8192

4 participants