[LICM] Sink unused l-invariant loads in preheader. #157559

VigneshwarJ · 2025-09-08T21:43:14Z

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM and also adds functionality to sinks unused load that's not clobbered by loop body.

Unused loop invariant loads were not sunk from preheader to exit block which increased live range. This commit sinks unused invariant loads from preheader to exit block when the load's memory location is not modified within the loop body.

llvmbot · 2025-09-08T21:43:47Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Vigneshwar Jayakumar (VigneshwarJ)

Changes

Unused loop invariant loads were not sunk from preheader to exit block which increased live range.

This commit sinks unused invariant loads from preheader to exit block when the load's memory location is not modified within the loop body.

Full diff: https://github.com/llvm/llvm-project/pull/157559.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (+40-10)
(modified) llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll (+61)
(modified) llvm/test/Transforms/LoopSimplify/ashr-crash.ll (+1-1)

diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index c32731185afd0..61138e379607c 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -130,6 +130,7 @@ class IndVarSimplify {
   const DataLayout &DL;
   TargetLibraryInfo *TLI;
   const TargetTransformInfo *TTI;
+  AliasAnalysis *AA;
   std::unique_ptr<MemorySSAUpdater> MSSAU;
 
   SmallVector<WeakTrackingVH, 16> DeadInsts;
@@ -162,8 +163,9 @@ class IndVarSimplify {
 public:
   IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
                  const DataLayout &DL, TargetLibraryInfo *TLI,
-                 TargetTransformInfo *TTI, MemorySSA *MSSA, bool WidenIndVars)
-      : LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI),
+                 TargetTransformInfo *TTI, AliasAnalysis *AA, MemorySSA *MSSA,
+                 bool WidenIndVars)
+      : LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI), AA(AA),
         WidenIndVars(WidenIndVars) {
     if (MSSA)
       MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
@@ -1089,6 +1091,16 @@ bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
   if (!Preheader) return false;
 
   bool MadeAnyChanges = false;
+
+  // Collect all store instructions that may modify memory in the loop.
+  SmallPtrSet<Instruction *, 8> Stores;
+  for (BasicBlock *BB : L->blocks()) {
+    for (Instruction &I : *BB) {
+      if (I.mayWriteToMemory())
+        Stores.insert(&I);
+    }
+  }
+
   for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
 
     // Skip BB Terminator.
@@ -1100,14 +1112,32 @@ bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
       break;
 
     // Don't move instructions which might have side effects, since the side
-    // effects need to complete before instructions inside the loop.  Also don't
-    // move instructions which might read memory, since the loop may modify
-    // memory. Note that it's okay if the instruction might have undefined
-    // behavior: LoopSimplify guarantees that the preheader dominates the exit
-    // block.
-    if (I.mayHaveSideEffects() || I.mayReadFromMemory())
+    // effects need to complete before instructions inside the loop. Note that
+    // it's okay if the instruction might have undefined behavior: LoopSimplify
+    // guarantees that the preheader dominates the exit block.
+    if (I.mayHaveSideEffects())
       continue;
 
+    // Don't sink read instruction which's memory might be modified in the loop.
+    if (I.mayReadFromMemory()) {
+      if (LoadInst *Load = dyn_cast<LoadInst>(&I)) {
+        MemoryLocation Loc = MemoryLocation::get(Load);
+        bool isModified = false;
+
+        // Check if any store instruction in the loop modifies the loaded memory
+        // location.
+        for (Instruction *S : Stores) {
+          if (isModSet(AA->getModRefInfo(S, Loc))) {
+            isModified = true;
+            break;
+          }
+        }
+        if (isModified)
+          continue;
+      } else {
+        continue;
+      }
+    }
     // Skip debug or pseudo instructions.
     if (I.isDebugOrPseudoInst())
       continue;
@@ -2042,8 +2072,8 @@ PreservedAnalyses IndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,
   Function *F = L.getHeader()->getParent();
   const DataLayout &DL = F->getDataLayout();
 
-  IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, AR.MSSA,
-                     WidenIndVars && AllowIVWidening);
+  IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, &AR.AA,
+                     AR.MSSA, WidenIndVars && AllowIVWidening);
   if (!IVS.run(&L))
     return PreservedAnalyses::all();
 
diff --git a/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll b/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
index 89583f9131518..8d714c775bde5 100644
--- a/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
+++ b/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
@@ -30,3 +30,64 @@ loop:
 exit:
   ret i32 %add
 }
+
+define i32 @test_with_unused_load(i32 %a, ptr %b, i32 %N) {
+; CHECK-LABEL: @test_with_unused_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[A:%.*]], [[LOAD]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+entry:
+  %load = load i32, ptr %b
+  %add = add i32 %a, %load
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv.next, %N
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %add
+}
+
+; load's memory location is modified inside the loop, don't sink load.
+define i32 @test_with_unused_load_modified_store(i32 %a, ptr %b, i32 %N) {
+; CHECK-LABEL: @test_with_unused_load_modified_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    store i32 [[IV_NEXT]], ptr [[B]], align 4
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[A:%.*]], [[LOAD]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+entry:
+  %load = load i32, ptr %b
+  %add = add i32 %a, %load
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %iv.next = add i32 %iv, 1
+  store i32 %iv.next, ptr %b
+  %cmp = icmp slt i32 %iv.next, %N
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %add
+}
diff --git a/llvm/test/Transforms/LoopSimplify/ashr-crash.ll b/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
index 85198eb19aad6..ae4f9adf8e16c 100644
--- a/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
+++ b/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
@@ -29,11 +29,11 @@ define void @foo() {
 ; CHECK-LABEL: define void @foo() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store i32 0, ptr @d, align 4
-; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr @c, align 4
 ; CHECK-NEXT:    br label [[FOR_COND1_PREHEADER:%.*]]
 ; CHECK:       for.cond1.preheader:
 ; CHECK-NEXT:    br label [[FOR_BODY3:%.*]]
 ; CHECK:       for.body3:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr @c, align 4
 ; CHECK-NEXT:    store i32 1, ptr @a, align 4
 ; CHECK-NEXT:    store i32 1, ptr @d, align 4
 ; CHECK-NEXT:    [[CMP4_LE_LE_INV:%.*]] = icmp sgt i32 [[TMP0]], 0

nikic

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)

It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

srpande · 2025-09-09T19:57:20Z

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

+        }
+        if (isModified)
+          continue;
+      } else {


Please add comment for clarity.

srpande · 2025-09-09T20:00:02Z

llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll

 exit:
  ret i32 %add
 }
+


Please add at least one (two is better) test with noalias & alias_info metadata for loads and stores; and show whether load will sink or not depending on metadata.

I would also write these tests on a separate .ll file.

VigneshwarJ · 2025-09-11T18:13:05Z

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)

It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

I agree that the whole logic of sinkUnusedInvariants can be removed from here and moved. I will try doing that in LICM, but LICM also hoists/sinks instructions from loop to preheader/exit block. So I have to add it at the end. I want to know if LoopSink.cpp would be a better place to move the logic to (it also has access to MemorySSA). It already sinks from preheader back to loop blocks. We could do the sinking to exit block there as well.
indvars is called before loop unrolling. So right now, the sinking happens before unrolling. Loopsink pass is called at much later stage.
The main reason I started looking at sinking loads is when the other invariant instructions that uses the load are sunk and the loop is unrolled, this increased the live range of load drastically.

srpande · 2025-09-15T19:31:21Z

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)
It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

I agree that the whole logic of sinkUnusedInvariants can be removed from here and moved. I will try doing that in LICM, but LICM also hoists/sinks instructions from loop to preheader/exit block. So I have to add it at the end. I want to know if LoopSink.cpp would be a better place to move the logic to (it also has access to MemorySSA). It already sinks from preheader back to loop blocks. We could do the sinking to exit block there as well. indvars is called before loop unrolling. So right now, the sinking happens before unrolling. Loopsink pass is called at much later stage. The main reason I started looking at sinking loads is when the other invariant instructions that uses the load are sunk and the loop is unrolled, this increased the live range of load drastically.

LoopSink might be a better place, as the original intent of this code is to reduce the register pressure, and I believe moving it to LoopSink would solve that purpose. Having said that, it might have some (a few) performance impacts on the applications out there.

nikic · 2025-09-15T20:03:59Z

LoopSink is a PGO pass. Assuming you don't want to drive this with profiling information, it's probably not a good fit for that reason.

llvm/lib/Transforms/Scalar/LICM.cpp

nikic · 2025-09-24T20:15:57Z

llvm/lib/Transforms/Scalar/LICM.cpp

+      if (auto *PN = dyn_cast<PHINode>(UserI)) {
+        unsigned OpIdx =
+            PHINode::getIncomingValueNumForOperand(U.getOperandNo());
+        UseBB = PN->getIncomingBlock(OpIdx);


You can directly pass U, no need to call getIncomingValueNumForOperand().

If @VigneshwarJ is to do that, then he should have two patches;
first, for cleaning up what is already here like what you are mentioning.
second, move this sub pass to an appropriate location.

nikic · 2025-09-24T20:16:39Z

llvm/lib/Transforms/Scalar/LICM.cpp

+  // sink pre-header defs that are unused in-loop into the unique exit to reduce
+  // pressure.
+  Changed |=
+      sinkUnusedInvariantsFromPreheaderToExit(L, AA, &SafetyInfo, MSSAU, SE);


Move this above forgetLoopDispositions() and the verification calls?

jayfoad · 2025-09-25T08:15:53Z

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

What live range? If the result is unused, how can it contribute to register pressure in the loop body?

nikic · 2025-09-25T08:19:58Z

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

What live range? If the result is unused, how can it contribute to register pressure in the loop body?

I believe "unused" here refers to "unused inside the loop, but used after the loop".

nikic · 2025-09-25T12:51:25Z

Crash reproducer:

; RUN: opt -S -passes="loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch)" < %s

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define void @_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined(ptr %0) {
  %2 = alloca i32, align 4
  call void @llvm.lifetime.start.p0(ptr %2)
  br label %3

3:                                                ; preds = %5, %1
  %4 = load i32, ptr null, align 4
  br label %.preheader

.preheader:                                       ; preds = %7, %3
  %.049 = phi i32 [ %8, %7 ], [ 0, %3 ]
  br i1 false, label %7, label %5

5:                                                ; preds = %.preheader
  %6 = load ptr, ptr %0, align 8
  store float 0.000000e+00, ptr %6, align 4
  br label %3

7:                                                ; preds = %.preheader
  %8 = add i32 0, 0
  br label %.preheader
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(ptr captures(none)) #0

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

opt: /home/npopov/repos/llvm-project/llvm/include/llvm/Support/GenericDomTree.h:403: DomTreeNodeBase<NodeT> *llvm::DominatorTreeBase<llvm::BasicBlock, false>::getNode(const NodeT *) const [NodeT = llvm::BasicBlock, IsPostDom = false]: Assertion `(!BB || Parent == NodeTrait::getParent(const_cast<NodeT *>(BB))) && "cannot get DomTreeNode of block with different parent"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.	Program arguments: build/bin/opt -S -passes=loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch) test.ll
1.	Running pass "function(loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;no-prepare-for-lto>,simple-loop-unswitch<no-nontrivial;trivial>))" on module "test.ll"
2.	Running pass "loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;no-prepare-for-lto>,simple-loop-unswitch<no-nontrivial;trivial>)" on function "_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined"
 #0 0x0000000004a159f8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build/bin/opt+0x4a159f8)
 #1 0x0000000004a12fc5 llvm::sys::RunSignalHandlers() (build/bin/opt+0x4a12fc5)
 #2 0x0000000004a16a91 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f149ea28bb0 __restore_rt (/lib64/libc.so.6+0x19bb0)
 #4 0x00007f149ea8209c __pthread_kill_implementation (/lib64/libc.so.6+0x7309c)
 #5 0x00007f149ea28a7e gsignal (/lib64/libc.so.6+0x19a7e)
 #6 0x00007f149ea106d0 abort (/lib64/libc.so.6+0x16d0)
 #7 0x00007f149ea10639 __assert_perror_fail (/lib64/libc.so.6+0x1639)
 #8 0x000000000501d649 llvm::DominatorTreeBase<llvm::BasicBlock, false>::dominates(llvm::BasicBlock const*, llvm::BasicBlock const*) const (build/bin/opt+0x501d649)
 #9 0x00000000052b1208 (anonymous namespace)::ClobberWalker::findClobber(llvm::BatchAAResults&, llvm::MemoryAccess*, (anonymous namespace)::UpwardsMemoryQuery&, unsigned int&) MemorySSA.cpp:0:0
#10 0x00000000052b2887 llvm::MemorySSA::ClobberWalkerBase::getClobberingMemoryAccessBase(llvm::MemoryAccess*, llvm::BatchAAResults&, unsigned int&, bool, bool) (build/bin/opt+0x52b2887)
#11 0x00000000052b6c05 llvm::MemorySSA::SkipSelfWalker::getClobberingMemoryAccess(llvm::MemoryAccess*, llvm::BatchAAResults&) MemorySSA.cpp:0:0
#12 0x00000000053f55a8 pointerInvalidatedByLoop(llvm::MemorySSA*, llvm::MemoryUse*, llvm::Loop*, llvm::Instruction&, llvm::SinkAndHoistLICMFlags&, bool) LICM.cpp:0:0
#13 0x00000000053ee957 llvm::canSinkOrHoistInst(llvm::Instruction&, llvm::AAResults*, llvm::DominatorTree*, llvm::Loop*, llvm::MemorySSAUpdater&, bool, llvm::SinkAndHoistLICMFlags&, llvm::OptimizationRemarkEmitter*) (build/bin/opt+0x53ee957)

VigneshwarJ · 2025-09-25T15:49:29Z

; RUN: opt -S -passes="loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch)" < %s

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define void @_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined(ptr %0) {
  %2 = alloca i32, align 4
  call void @llvm.lifetime.start.p0(ptr %2)
  br label %3

3:                                                ; preds = %5, %1
  %4 = load i32, ptr null, align 4
  br label %.preheader

.preheader:                                       ; preds = %7, %3
  %.049 = phi i32 [ %8, %7 ], [ 0, %3 ]
  br i1 false, label %7, label %5

5:                                                ; preds = %.preheader
  %6 = load ptr, ptr %0, align 8
  store float 0.000000e+00, ptr %6, align 4
  br label %3

7:                                                ; preds = %.preheader
  %8 = add i32 0, 0
  br label %.preheader
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(ptr captures(none)) #0

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

(edit) I see the crash now once I updated with main branch

VigneshwarJ · 2025-09-26T20:42:34Z

@nikic , That was due to a bad MSSA update, fixed that issue.

VigneshwarJ · 2025-10-03T05:06:11Z

Ping for review

VigneshwarJ · 2025-10-14T16:18:14Z

@nikic ping for review

srpande

Just some nitpick. Otherwise LGTM

llvm/lib/Transforms/Scalar/LICM.cpp

nikic · 2025-10-27T20:35:32Z

@zyw-bot csmith-fuzz

llvm/lib/Transforms/Scalar/LICM.cpp

nikic

LGTM

Worth noting that this kind of sinking can also increase register pressure in the loop. E.g. if you sink an instruction with two operands, and the operand instructions cannot be sunk. Then you'll need two registers instead of one in the loop.

Though from what I can tell, this does seem to be beneficial on average.

nikic · 2025-10-29T11:14:20Z

llvm/test/Transforms/LICM/sink-from-preheader.ll

+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=licm -verify-memoryssa -S | FileCheck %s
+target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
+target triple = "i386-apple-darwin10.0"


Omit the triple if it's not needed. Otherwise the test may need REQUIRES.

nikic · 2025-10-29T11:14:37Z

llvm/test/Transforms/LICM/sink-from-preheader.ll

+target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
+target triple = "i386-apple-darwin10.0"
+
+; We make sinking here, Changed flag should be set properly.


Suggested change

; We make sinking here, Changed flag should be set properly.

; We perform sinking here, Changed flag should be set properly.

llvm/test/Transforms/LICM/sink-trapping.ll

Unused loop invariant loads were not sunk from the preheader to the exit block, increasing live range. This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM also adds functionality to sink unused load that's not clobbered by the loop body.

yzhang93 · 2025-11-06T00:22:41Z

Hi @VigneshwarJ, I'm working on the downstream project and found the GPU performance for some GEMMs were regressed after this PR. I've added some useful information here #166671. Could you help take a look?

VigneshwarJ · 2025-11-06T16:48:41Z

@yzhang93 sure, I will look at the issue.

llvmbot added the llvm:transforms label Sep 8, 2025

VigneshwarJ requested a review from srpande September 8, 2025 21:43

fix tests

61ad654

llvmbot added clang Clang issues not falling into any other category clang:openmp OpenMP related changes to Clang labels Sep 8, 2025

VigneshwarJ added 3 commits September 8, 2025 19:46

Merge branch 'main' into sink_invariant_loads

68667db

Merge branch 'main' into sink_invariant_loads

905434c

Merge branch 'main' into sink_invariant_loads

26d7e99

nikic requested changes Sep 9, 2025

View reviewed changes

srpande reviewed Sep 9, 2025

View reviewed changes

VigneshwarJ added 2 commits September 23, 2025 10:09

Merge remote-tracking branch 'upstream/main' into sink_invariant_loads

7c01671

moved the code to LICM

fe37acb

llvmbot added the backend:AMDGPU label Sep 24, 2025

VigneshwarJ changed the title ~~[IndVarSimplify] Sink unused l-invariant loads in preheader.~~ [LICM] Sink unused l-invariant loads in preheader. Sep 24, 2025

VigneshwarJ added 2 commits September 24, 2025 13:31

Merge branch 'main' into sink_invariant_loads

90404be

update tests

c0ba9e9

llvmbot added the backend:PowerPC label Sep 24, 2025

VigneshwarJ requested review from nikic and srpande September 24, 2025 19:12

nikic mentioned this pull request Sep 24, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Sep 24, 2025

pre-commit: PR157559 dtcxzyw/llvm-opt-benchmark#2858

Closed

nikic reviewed Sep 24, 2025

View reviewed changes

VigneshwarJ added 2 commits September 25, 2025 10:35

changing to use canSinkOrHoist

bd26a88

changes

c5601ff

VigneshwarJ added 3 commits September 25, 2025 12:09

Merge branch 'main' into sink_invariant_loads

9a7c53b

update tests

b0a8eae

fix mssa updates

5b345e1

VigneshwarJ requested a review from nikic September 26, 2025 20:42

Merge branch 'main' into sink_invariant_loads

5f1fd21

srpande approved these changes Oct 27, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

zyw-bot mentioned this pull request Oct 27, 2025

pre-commit: PR157559 dtcxzyw/llvm-opt-benchmark#2987

Closed

zyw-bot mentioned this pull request Oct 27, 2025

Fuzz PR157559 dtcxzyw/llvm-fuzz-service#151

Closed

nikic reviewed Oct 27, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

moved sink between sink and hoist

7f8e279

VigneshwarJ requested a review from nikic October 27, 2025 22:48

zyw-bot mentioned this pull request Oct 29, 2025

pre-commit: PR157559 dtcxzyw/llvm-opt-benchmark#2994

Closed

nikic approved these changes Oct 29, 2025

View reviewed changes

minor fixes

43c1fc6

VigneshwarJ merged commit 469702c into llvm:main Oct 30, 2025
10 checks passed

yzhang93 mentioned this pull request Nov 6, 2025

GEMM performance regressed after sinking unused l-invariant loads #166671

Open

zGoldthorpe added a commit to ROCm/llvm-project that referenced this pull request Nov 12, 2025

Cherry-pick PR llvm#157559

0ce78d7

	; We make sinking here, Changed flag should be set properly.
	; We perform sinking here, Changed flag should be set properly.

[LICM] Sink unused l-invariant loads in preheader. #157559

[LICM] Sink unused l-invariant loads in preheader. #157559

Uh oh!

Conversation

VigneshwarJ commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

srpande Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

srpande Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

VigneshwarJ commented Sep 11, 2025

Uh oh!

srpande commented Sep 15, 2025

Uh oh!

nikic commented Sep 15, 2025

Uh oh!

Uh oh!

nikic Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

srpande Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Sep 25, 2025

Uh oh!

nikic commented Sep 25, 2025

Uh oh!

nikic commented Sep 25, 2025

Uh oh!

VigneshwarJ commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VigneshwarJ commented Sep 26, 2025

Uh oh!

VigneshwarJ commented Oct 3, 2025

Uh oh!

VigneshwarJ commented Oct 14, 2025

Uh oh!

srpande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic commented Oct 27, 2025

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yzhang93 commented Nov 6, 2025

Uh oh!

VigneshwarJ commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

VigneshwarJ commented Sep 8, 2025 •

edited

Loading

llvmbot commented Sep 8, 2025 •

edited

Loading

VigneshwarJ commented Sep 25, 2025 •

edited

Loading

VigneshwarJ commented Nov 6, 2025 •

edited

Loading