-
Notifications
You must be signed in to change notification settings - Fork 74
[LoadStoreOpToLLVM] Improve the block io 2d load lowering for the case that maskConstancyVer < tileHeight. #5416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the block IO 2D load lowering in the LoadStoreOpToLLVM pass to handle cases where maskConstancyVer < tileHeight. The key improvement is a register base rearrangement strategy that rotates register bases of the adjusted tile height portion to appear after vBlocks, enabling easier adjustment of tile height for block IO operations.
Key Changes:
- Enhanced mask constancy validation to handle vertical mask constancy less than tile height
- Added register base rearrangement logic using
std::rotateto optimize memory layout - Added MSVC compatibility for
__builtin_clzand__builtin_ctzintrinsics
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp |
Core implementation: Added MSVC intrinsic compatibility, enhanced mask validation, and register rearrangement logic for tile height adjustment |
test/TritonIntelGPU/tensor-pointer-load-block-2d.mlir |
Added comprehensive test cases for different DPAS configurations and repCluster patterns to validate the improved lowering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
058131c to
b1be507
Compare
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
b1be507 to
fe492f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
fe492f8 to
469dde9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e that maskConstancyVer < tileHeight. Signed-off-by: Lu,Chengjun <[email protected]>
To improve the block IO 2D load lowering for the case that maskConstancyVer < tileHeight.
Move the register bases of the adjusted part of tile height to the place after the vBlocks.
That we can easily adjust the size of the tile height of the block io.