-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[CodeGen] Ignore requiresStructuredCFG check in canSplitCriticalEdge if successor is loop header #154063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen] Ignore requiresStructuredCFG check in canSplitCriticalEdge if successor is loop header #154063
Conversation
…if successor is loop header This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split.
|
@llvm/pr-subscribers-backend-nvptx Author: Wenju He (wenju-he) ChangesThis addresses a performance issue for our downstream GPU target that Full diff: https://github.com/llvm/llvm-project/pull/154063.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 9e3d9196cc184..7df34a76912dd 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -1035,7 +1035,9 @@ class MachineBasicBlock
/// Succ, can be split. If this returns true a subsequent call to
/// SplitCriticalEdge is guaranteed to return a valid basic block if
/// no changes occurred in the meantime.
- LLVM_ABI bool canSplitCriticalEdge(const MachineBasicBlock *Succ) const;
+ LLVM_ABI bool
+ canSplitCriticalEdge(const MachineBasicBlock *Succ,
+ const MachineLoopInfo *MLI = nullptr) const;
void pop_front() { Insts.pop_front(); }
void pop_back() { Insts.pop_back(); }
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index c3c5a0f5102d7..8c795f812df09 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -1160,7 +1160,7 @@ MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
MachineBasicBlock *Succ, const SplitCriticalEdgeAnalyses &Analyses,
std::vector<SparseBitVector<>> *LiveInSets, MachineDomTreeUpdater *MDTU) {
- if (!canSplitCriticalEdge(Succ))
+ if (!canSplitCriticalEdge(Succ, Analyses.MLI))
return nullptr;
MachineFunction *MF = getParent();
@@ -1388,8 +1388,8 @@ MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
return NMBB;
}
-bool MachineBasicBlock::canSplitCriticalEdge(
- const MachineBasicBlock *Succ) const {
+bool MachineBasicBlock::canSplitCriticalEdge(const MachineBasicBlock *Succ,
+ const MachineLoopInfo *MLI) const {
// Splitting the critical edge to a landing pad block is non-trivial. Don't do
// it in this generic function.
if (Succ->isEHPad())
@@ -1403,7 +1403,15 @@ bool MachineBasicBlock::canSplitCriticalEdge(
const MachineFunction *MF = getParent();
// Performance might be harmed on HW that implements branching using exec mask
// where both sides of the branches are always executed.
- if (MF->getTarget().requiresStructuredCFG())
+ // However, if `Succ` is a loop header, splitting the critical edge will not
+ // break structured CFG.
+ auto SuccIsLoopHeader = [&]() {
+ if (MLI)
+ if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ)
+ return true;
+ return false;
+ };
+ if (MF->getTarget().requiresStructuredCFG() && !SuccIsLoopHeader())
return false;
// Do we have an Indirect jump with a jumptable that we can rewrite?
diff --git a/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir b/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir
new file mode 100644
index 0000000000000..f2f0ffdec8094
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir
@@ -0,0 +1,72 @@
+# RUN: llc -mtriple=nvptx64 -mcpu=sm_20 -run-pass=early-machinelicm %s -verify-machineinstrs -o - | FileCheck %s
+
+# This test checks that the early-machineLICM pass successfully creates a new
+# loop preheader by splitting the critical edge and hoisting the loop invariant
+# value `%18` to the preheader.
+# Since the critical edge successor is a loop header, the splitting does not
+# break the structured CFG, which is a requirement for the NVPTX target.
+
+---
+name: test_hoist
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: b64, preferred-register: '', flags: [ ] }
+ - { id: 1, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 2, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 3, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 4, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 5, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 6, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 7, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 8, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 9, class: b64, preferred-register: '', flags: [ ] }
+ - { id: 10, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 11, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 12, class: b32, preferred-register: '', flags: [ ] }
+ - { id: 13, class: b64, preferred-register: '', flags: [ ] }
+ - { id: 14, class: b64, preferred-register: '', flags: [ ] }
+ - { id: 15, class: b64, preferred-register: '', flags: [ ] }
+ - { id: 16, class: b1, preferred-register: '', flags: [ ] }
+ - { id: 17, class: b1, preferred-register: '', flags: [ ] }
+ - { id: 18, class: b32, preferred-register: '', flags: [ ] }
+body: |
+ bb.0.entry:
+ successors: %bb.2(0x30000000), %bb.1(0x50000000)
+
+ %8:b32 = LD_i32 0, 0, 101, 3, 32, &test_hoist_param_2, 0 :: (dereferenceable invariant load (s32), addrspace 101)
+ %7:b32 = LD_i32 0, 0, 101, 3, 32, &test_hoist_param_1, 0 :: (dereferenceable invariant load (s32), addrspace 101)
+ %9:b64 = LD_i64 0, 0, 101, 3, 64, &test_hoist_param_0, 0 :: (dereferenceable invariant load (s64), addrspace 101)
+ %10:b32 = INT_PTX_SREG_CTAID_x
+ %11:b32 = INT_PTX_SREG_NTID_x
+ %12:b32 = INT_PTX_SREG_TID_x
+ %13:b64 = CVT_u64_u32 killed %12, 0
+ %14:b64 = nuw MAD_WIDE_U32rrr killed %11, killed %10, killed %13
+ %15:b64 = nuw nsw SHL64_ri killed %14, 2
+ %0:b64 = nuw ADD64rr killed %9, killed %15
+ %1:b32 = LD_i32 0, 0, 1, 3, 32, %0, 0
+ %16:b1 = SETP_i32ri %8, 0, 0
+ CBranch killed %16, %bb.2
+ GOTO %bb.1
+
+ ; CHECK: bb.3:
+ ; CHECK: successors: %bb.1(0x80000000)
+ ; CHECK: %18:b32 = ADD32ri %7, -1
+ ; CHECK: bb.1:
+
+ bb.1:
+ successors: %bb.2(0x04000000), %bb.1(0x7c000000)
+
+ %2:b32 = PHI %8, %bb.0, %5, %bb.1
+ %3:b32 = PHI %1, %bb.0, %4, %bb.1
+ %18:b32 = ADD32ri %7, -1
+ %4:b32 = SREM32rr %3, %18
+ %5:b32 = ADD32ri %2, -1
+ %17:b1 = SETP_i32ri %5, 0, 1
+ CBranch killed %17, %bb.1
+ GOTO %bb.2
+
+ bb.2:
+ %6:b32 = PHI %1, %bb.0, %4, %bb.1
+ ST_i32 %6, 0, 0, 1, 32, %0, 0
+ Return
+...
|
…resStructuredCFG-loop-header
| if (MLI) | ||
| if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ) | ||
| return true; | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (MLI) | |
| if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ) | |
| return true; | |
| return false; | |
| if (MLI) | |
| const MachineLoop *L = MLI->getLoopFor(Succ); | |
| return L && L->getHeader() == Succ; |
This also doesn't need to be a lambda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also doesn't need to be a lambda
done, thanks
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/163/builds/27168 Here is the relevant piece of the build log for the reference |
…if successor is loop header (llvm#154063) This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split. The critical edge's destination a loop header. Splitting the critical edge will not break structured CFG. Add a nvptx test to demonstrate the issue since the target also requires structured CFG. --------- Co-authored-by: Matt Arsenault <[email protected]>
This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.
Add a nvptx test to demonstrate the issue since the target also
requires structured CFG.