Skip to content

Conversation

@yudongsi
Copy link
Contributor

@yudongsi yudongsi commented Nov 7, 2024

This change (grid order adjustment to improve cache hit) originating from #2600. Batched gemm only.
~99% of XeTLA for 4096x8x128x16384.
image

@yudongsi yudongsi merged commit ca95a70 into main Nov 11, 2024
5 checks passed
@yudongsi yudongsi deleted the yudong/tune-in-ci branch November 11, 2024 05:17
yudongsi added a commit that referenced this pull request Nov 13, 2024
yudongsi added a commit that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GEMM] Improve performance of shape 4096x8x128x16384

3 participants