Skip to content

Conversation

@wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Aug 18, 2025

This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.

Add a nvptx test to demonstrate the issue since the target also
requires structured CFG.

…if successor is loop header

This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not split.
@llvmbot
Copy link
Member

llvmbot commented Sep 1, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Wenju He (wenju-he)

Changes

This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.


Full diff: https://github.com/llvm/llvm-project/pull/154063.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineBasicBlock.h (+3-1)
  • (modified) llvm/lib/CodeGen/MachineBasicBlock.cpp (+12-4)
  • (added) llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir (+72)
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 9e3d9196cc184..7df34a76912dd 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -1035,7 +1035,9 @@ class MachineBasicBlock
   /// Succ, can be split. If this returns true a subsequent call to
   /// SplitCriticalEdge is guaranteed to return a valid basic block if
   /// no changes occurred in the meantime.
-  LLVM_ABI bool canSplitCriticalEdge(const MachineBasicBlock *Succ) const;
+  LLVM_ABI bool
+  canSplitCriticalEdge(const MachineBasicBlock *Succ,
+                       const MachineLoopInfo *MLI = nullptr) const;
 
   void pop_front() { Insts.pop_front(); }
   void pop_back() { Insts.pop_back(); }
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index c3c5a0f5102d7..8c795f812df09 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -1160,7 +1160,7 @@ MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
 MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
     MachineBasicBlock *Succ, const SplitCriticalEdgeAnalyses &Analyses,
     std::vector<SparseBitVector<>> *LiveInSets, MachineDomTreeUpdater *MDTU) {
-  if (!canSplitCriticalEdge(Succ))
+  if (!canSplitCriticalEdge(Succ, Analyses.MLI))
     return nullptr;
 
   MachineFunction *MF = getParent();
@@ -1388,8 +1388,8 @@ MachineBasicBlock *MachineBasicBlock::SplitCriticalEdge(
   return NMBB;
 }
 
-bool MachineBasicBlock::canSplitCriticalEdge(
-    const MachineBasicBlock *Succ) const {
+bool MachineBasicBlock::canSplitCriticalEdge(const MachineBasicBlock *Succ,
+                                             const MachineLoopInfo *MLI) const {
   // Splitting the critical edge to a landing pad block is non-trivial. Don't do
   // it in this generic function.
   if (Succ->isEHPad())
@@ -1403,7 +1403,15 @@ bool MachineBasicBlock::canSplitCriticalEdge(
   const MachineFunction *MF = getParent();
   // Performance might be harmed on HW that implements branching using exec mask
   // where both sides of the branches are always executed.
-  if (MF->getTarget().requiresStructuredCFG())
+  // However, if `Succ` is a loop header, splitting the critical edge will not
+  // break structured CFG.
+  auto SuccIsLoopHeader = [&]() {
+    if (MLI)
+      if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ)
+        return true;
+    return false;
+  };
+  if (MF->getTarget().requiresStructuredCFG() && !SuccIsLoopHeader())
     return false;
 
   // Do we have an Indirect jump with a jumptable that we can rewrite?
diff --git a/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir b/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir
new file mode 100644
index 0000000000000..f2f0ffdec8094
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir
@@ -0,0 +1,72 @@
+# RUN: llc -mtriple=nvptx64 -mcpu=sm_20 -run-pass=early-machinelicm %s -verify-machineinstrs -o - | FileCheck %s
+
+# This test checks that the early-machineLICM pass successfully creates a new
+# loop preheader by splitting the critical edge and hoisting the loop invariant
+# value `%18` to the preheader.
+# Since the critical edge successor is a loop header, the splitting does not
+# break the structured CFG, which is a requirement for the NVPTX target.
+
+---
+name:            test_hoist
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: b64, preferred-register: '', flags: [  ] }
+  - { id: 1, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 2, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 3, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 4, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 5, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 6, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 7, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 8, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 9, class: b64, preferred-register: '', flags: [  ] }
+  - { id: 10, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 11, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 12, class: b32, preferred-register: '', flags: [  ] }
+  - { id: 13, class: b64, preferred-register: '', flags: [  ] }
+  - { id: 14, class: b64, preferred-register: '', flags: [  ] }
+  - { id: 15, class: b64, preferred-register: '', flags: [  ] }
+  - { id: 16, class: b1, preferred-register: '', flags: [  ] }
+  - { id: 17, class: b1, preferred-register: '', flags: [  ] }
+  - { id: 18, class: b32, preferred-register: '', flags: [  ] }
+body:             |
+  bb.0.entry:
+    successors: %bb.2(0x30000000), %bb.1(0x50000000)
+
+    %8:b32 = LD_i32 0, 0, 101, 3, 32, &test_hoist_param_2, 0 :: (dereferenceable invariant load (s32), addrspace 101)
+    %7:b32 = LD_i32 0, 0, 101, 3, 32, &test_hoist_param_1, 0 :: (dereferenceable invariant load (s32), addrspace 101)
+    %9:b64 = LD_i64 0, 0, 101, 3, 64, &test_hoist_param_0, 0 :: (dereferenceable invariant load (s64), addrspace 101)
+    %10:b32 = INT_PTX_SREG_CTAID_x
+    %11:b32 = INT_PTX_SREG_NTID_x
+    %12:b32 = INT_PTX_SREG_TID_x
+    %13:b64 = CVT_u64_u32 killed %12, 0
+    %14:b64 = nuw MAD_WIDE_U32rrr killed %11, killed %10, killed %13
+    %15:b64 = nuw nsw SHL64_ri killed %14, 2
+    %0:b64 = nuw ADD64rr killed %9, killed %15
+    %1:b32 = LD_i32 0, 0, 1, 3, 32, %0, 0
+    %16:b1 = SETP_i32ri %8, 0, 0
+    CBranch killed %16, %bb.2
+    GOTO %bb.1
+
+  ; CHECK: bb.3:
+  ; CHECK:   successors: %bb.1(0x80000000)
+  ; CHECK:   %18:b32 = ADD32ri %7, -1
+  ; CHECK: bb.1:
+
+  bb.1:
+    successors: %bb.2(0x04000000), %bb.1(0x7c000000)
+
+    %2:b32 = PHI %8, %bb.0, %5, %bb.1
+    %3:b32 = PHI %1, %bb.0, %4, %bb.1
+    %18:b32 = ADD32ri %7, -1
+    %4:b32 = SREM32rr %3, %18
+    %5:b32 = ADD32ri %2, -1
+    %17:b1 = SETP_i32ri %5, 0, 1
+    CBranch killed %17, %bb.1
+    GOTO %bb.2
+
+  bb.2:
+    %6:b32 = PHI %1, %bb.0, %4, %bb.1
+    ST_i32 %6, 0, 0, 1, 32, %0, 0
+    Return
+...

@wenju-he wenju-he requested a review from AlexMaclean September 1, 2025 08:22
Comment on lines 1409 to 1412
if (MLI)
if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ)
return true;
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (MLI)
if (MachineLoop *L = MLI->getLoopFor(Succ); L && L->getHeader() == Succ)
return true;
return false;
if (MLI)
const MachineLoop *L = MLI->getLoopFor(Succ);
return L && L->getHeader() == Succ;

This also doesn't need to be a lambda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also doesn't need to be a lambda

done, thanks

@wenju-he wenju-he requested a review from arsenm September 24, 2025 05:47
@github-actions
Copy link

github-actions bot commented Sep 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@wenju-he wenju-he merged commit 745e1e6 into llvm:main Sep 26, 2025
7 of 9 checks passed
@wenju-he wenju-he deleted the MachineBasicBlock-canSplitCriticalEdge-requiresStructuredCFG-loop-header branch September 26, 2025 09:25
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2025

LLVM Buildbot has detected a new failure on builder cross-project-tests-sie-ubuntu-dwarf5 running on doug-worker-1b while building llvm at step 6 "test-build-unified-tree-check-cross-project".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/163/builds/27168

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-cross-project) failure: test (failure)
******************** TEST 'cross-project-tests :: debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
clang++ -O0 -glldb -std=gnu++11 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp -o /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/Output/limit_steps_line_mismatch.cpp.tmp # RUN: at line 6
+ clang++ -O0 -glldb -std=gnu++11 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp -o /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/Output/limit_steps_line_mismatch.cpp.tmp
"/usr/bin/python3.10" "/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/dexter.py" test --fail-lt 1.0 -w -v --debugger lldb-dap --lldb-executable "/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/bin/lldb-dap" --dap-message-log=-e --binary /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/Output/limit_steps_line_mismatch.cpp.tmp -- /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp | /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/bin/FileCheck /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp # RUN: at line 7
+ /usr/bin/python3.10 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/dexter.py test --fail-lt 1.0 -w -v --debugger lldb-dap --lldb-executable /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/bin/lldb-dap --dap-message-log=-e --binary /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/Output/limit_steps_line_mismatch.cpp.tmp -- /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp
+ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/bin/FileCheck /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/perfect/limit_steps/limit_steps_line_mismatch.cpp
note: Opening DAP server: /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu-dwarf5/build/bin/lldb-dap
-> {
  "type": "request",
  "command": "initialize",
  "arguments": {
    "clientID": "dexter",
    "adapterID": "lldb-dap",
    "pathFormat": "path",
    "linesStartAt1": true,
    "columnsStartAt1": true,
    "supportsVariableType": true,
    "supportsVariablePaging": true,
    "supportsRunInTerminalRequest": false
  },
  "seq": 1
}
<- {
  "body": {
    "$__lldb_version": "lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 745e1e6ad5d40ff8f1553e62c48554a61611ee76)\n  clang revision 745e1e6ad5d40ff8f1553e62c48554a61611ee76\n  llvm revision 745e1e6ad5d40ff8f1553e62c48554a61611ee76",
    "completionTriggerCharacters": [
      ".",
      " ",
      "\t"
    ],
    "exceptionBreakpointFilters": [
      {
        "description": "C++ Catch",
        "filter": "cpp_catch",
        "label": "C++ Catch",
        "supportsCondition": true
      },
      {
        "description": "C++ Throw",
        "filter": "cpp_throw",
        "label": "C++ Throw",
        "supportsCondition": true
      },
      {
        "description": "Objective-C Catch",
        "filter": "objc_catch",
...

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…if successor is loop header (llvm#154063)

This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not
split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.

Add a nvptx test to demonstrate the issue since the target also
requires structured CFG.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants