-
Notifications
You must be signed in to change notification settings - Fork 69
[Draft] [BACKEND] Enhance the remove layout implementation to reduce the duplicated values with different layout in scf.for. #4527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
486ed4a
to
f42bd66
Compare
…d values with different layout in scf.for. Signed-off-by: Lu,Chengjun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the remove layout implementation to better handle layout propagation across scf.for
operations, addressing limitations that create performance bottlenecks on Intel GPU. The changes focus on reducing duplicated layout conversion operations by improving support for multi-result operations and nested basic blocks.
- Adds support for propagating layouts through
scf.for
operations with a newincludeForOp
parameter - Refactors
mappedValues
to handle multiple attribute mappings per value instead of single mappings - Includes debug output and unreachable code handling for
scf.for
operations
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
Utility.h | Adds includeForOp parameter to getConvertBackwardSlice function signature |
Utility.cpp | Implements scf.for layout propagation logic with early return check and debug output |
RemoveLayoutConversions.cpp | Updates data structures to support multiple encodings per value and enables scf.for processing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
return failure(); | ||
|
||
continue; | ||
} | ||
return failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return statement makes the code below unreachable. The logic for handling initOperand and yieldOperand (lines 243-253) will never execute, which appears to be the main implementation for scf.for support.
return failure(); |
Copilot uses AI. Check for mistakes.
llvm::outs() << "johnlu getBackward slice check scf.for initOperand: " | ||
<< initOperand->get() << "\n"; | ||
llvm::outs() << "johnlu getBackward slice check scf.for yieldOperand: " | ||
<< yieldOperand.get() << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug output should not be committed to production code. Consider using LLVM_DEBUG macro or removing these debug statements before merging.
<< yieldOperand.get() << "\n"; | |
LLVM_DEBUG(llvm::dbgs() << "johnlu getBackward slice check scf.for initOperand: " | |
<< initOperand->get() << "\n"); | |
LLVM_DEBUG(llvm::dbgs() << "johnlu getBackward slice check scf.for yieldOperand: " | |
<< yieldOperand.get() << "\n"); |
Copilot uses AI. Check for mistakes.
llvm::outs() << "johnlu getBackward slice check scf.for initOperand: " | ||
<< initOperand->get() << "\n"; | ||
llvm::outs() << "johnlu getBackward slice check scf.for yieldOperand: " | ||
<< yieldOperand.get() << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug output should not be committed to production code. Consider using LLVM_DEBUG macro or removing these debug statements before merging.
<< yieldOperand.get() << "\n"; | |
LLVM_DEBUG(llvm::dbgs() << "johnlu getBackward slice check scf.for initOperand: " | |
<< initOperand->get() << "\n"); | |
LLVM_DEBUG(llvm::dbgs() << "johnlu getBackward slice check scf.for yieldOperand: " | |
<< yieldOperand.get() << "\n"); |
Copilot uses AI. Check for mistakes.
@@ -1045,6 +1058,7 @@ void LayoutRematerialization::rewriteSlice(SetVector<Value> &slice, | |||
deadOps.push_back(forOp.getOperation()); | |||
Block &loopBody = *newForOp.getBody(); | |||
for (auto m : argMapping) { | |||
mapping.map(newForOp.getResult(m.first), newForOp.getResult(m.second)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line appears to map a result to itself when m.first equals m.second, which could be problematic. The mapping logic should ensure proper relationships between old and new ForOp results.
Copilot uses AI. Check for mistakes.
The layout propagation across the
scf.for
op in RemoveLayout is not implemented well for these aspects:scf.for
ops.With the limitations, the
scf.for
operation is the bottle neck of the efficient after the remove layout pass.This is not issue on NV GPU because the NV GPU convert the layout convert operations to async.cp in software pipeline.
But it is an issue for Intel GPU. We rely on the remove layout to get a simple program with less convert layout operations.
Plan to enhance the remove layout to enhance the limitations of the remove layout.
scf.for
ops on demand.This is an PR for CI.