Skip to content

Conversation

@yudongsi
Copy link
Contributor

@yudongsi yudongsi commented Nov 27, 2024

Close #2822

After:
90% -> 103%

Copy link
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any explanation? Does XeTLA use prefetch distance 4?

@yudongsi
Copy link
Contributor Author

Any explanation? Does XeTLA use prefetch distance 4?

XeTLA use prefetch distance 3 for all cases, but 3 is not optimum for this Triton case from tuning.

@whitneywhtsang whitneywhtsang merged commit 31fe770 into main Nov 28, 2024
6 checks passed
@whitneywhtsang whitneywhtsang deleted the yudong/gemm branch November 28, 2024 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GEMM] Improve performance of shape 1024x1024x1024 out of box

5 participants