-
Couldn't load subscription status.
- Fork 75
[LoadStoreToLLVM] Refactor the 2D block load lowering. #4615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5ac88f5 to
719c526
Compare
|
Need wait the relend of the block store code in PR #4646 |
719c526 to
d725f19
Compare
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
5689429 to
3cce958
Compare
3cce958 to
9308fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the 2D block load lowering implementation in the LoadStoreOpToLLVM pass by transitioning from a DPAS-specific approach to using linear layout. The refactoring simplifies the code structure while maintaining functionality for 2D block I/O operations.
Key changes include:
- Replaced complex DPAS-specific calculations with linear layout-based tile size determination
- Simplified load operation generation by using register mapping from linear layout
- Streamlined the code flow and reduced complexity in the load conversion logic
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| LoadStoreOpToLLVM.cpp | Refactored 2D block load lowering to use linear layout instead of DPAS-specific logic |
| test_block_store.py | Updated test to include block load operations and verify their generation |
| auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | ||
| regPackedBases] = |
Copilot
AI
Jul 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using structured bindings with auto can make the code less readable when the types are not obvious. Consider using explicit variable declarations or adding a comment explaining what getBlockIOTileSize returns.
| auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | |
| regPackedBases] = | |
| int tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim; | |
| std::vector<int> regPackedBases; | |
| std::tie(tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | |
| regPackedBases) = |
| unsigned totalBytesPerRowPerMatrix = tileWidth * packedElemSizeInBits / 8; | ||
| vBlocks = std::min(vBlocks, (int)(64 / totalBytesPerRowPerMatrix)); | ||
| vBlocks = std::min(4, vBlocks); | ||
| // HW issue for vblock = 4 |
Copilot
AI
Jul 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is unclear and doesn't explain what the hardware issue is or why vBlocks is set to 1 when it equals 4. Consider adding more context about the specific hardware limitation.
| // HW issue for vblock = 4 | |
| // Due to a hardware limitation, configurations where vBlocks equals 4 | |
| // are not supported. This issue arises because the hardware cannot handle | |
| // 2D block loads or stores with this specific configuration. To work around | |
| // this limitation, vBlocks is set to 1 when it equals 4. |
| unsigned opsPerChannel = dpasLayout.getOpsPerChannel(); | ||
| if ((opsPerChannel == 4 && elemSizeInBits == 8) || | ||
| (opsPerChannel == 2 && elemSizeInBits == 16)) { | ||
| // Use the VNNI packing format for DotOp B layout. |
Copilot
AI
Jul 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. If this assignment is needed for future reference, consider adding a TODO comment explaining why it's preserved.
| // Use the VNNI packing format for DotOp B layout. | |
| // Use the VNNI packing format for DotOp B layout. | |
| // TODO: Retain this line for reference in case packedType needs to be explicitly set to i32_ty in future updates. |
| assert(maskElems.size() == otherElems.size() && | ||
| "Invalid size of the masks."); |
Copilot
AI
Jul 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion compares maskElems.size() with otherElems.size(), but otherElems may be empty when there's no 'other' value. This could cause a false assertion failure when mask is provided but other is not.
| assert(maskElems.size() == otherElems.size() && | |
| "Invalid size of the masks."); | |
| assert((otherElems.empty() || maskElems.size() == otherElems.size()) && | |
| "Invalid size of the masks: maskElems and otherElems sizes do not match."); |
689448e to
a562276
Compare
…or pointer. Signed-off-by: Lu,Chengjun <[email protected]>
Hi @chengjunlu! It's not blocked anymore? |
It is no longer blocked, I will help landing this PR. |
Refactor the 2D block IO lowering for regular pointer by using the linear layout.