Skip to content

Conversation

@VigneshwarJ
Copy link
Contributor

@VigneshwarJ VigneshwarJ commented Sep 8, 2025

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM and also adds functionality to sinks unused load that's not clobbered by loop body.

Unused loop invariant loads were not sunk from preheader to exit block
which increased live range.

This commit sinks unused invariant loads from preheader to exit block
when the load's memory location is not modified within the loop body.
@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Vigneshwar Jayakumar (VigneshwarJ)

Changes

Unused loop invariant loads were not sunk from preheader to exit block which increased live range.

This commit sinks unused invariant loads from preheader to exit block when the load's memory location is not modified within the loop body.


Full diff: https://github.com/llvm/llvm-project/pull/157559.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (+40-10)
  • (modified) llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll (+61)
  • (modified) llvm/test/Transforms/LoopSimplify/ashr-crash.ll (+1-1)
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index c32731185afd0..61138e379607c 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -130,6 +130,7 @@ class IndVarSimplify {
   const DataLayout &DL;
   TargetLibraryInfo *TLI;
   const TargetTransformInfo *TTI;
+  AliasAnalysis *AA;
   std::unique_ptr<MemorySSAUpdater> MSSAU;
 
   SmallVector<WeakTrackingVH, 16> DeadInsts;
@@ -162,8 +163,9 @@ class IndVarSimplify {
 public:
   IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
                  const DataLayout &DL, TargetLibraryInfo *TLI,
-                 TargetTransformInfo *TTI, MemorySSA *MSSA, bool WidenIndVars)
-      : LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI),
+                 TargetTransformInfo *TTI, AliasAnalysis *AA, MemorySSA *MSSA,
+                 bool WidenIndVars)
+      : LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI), AA(AA),
         WidenIndVars(WidenIndVars) {
     if (MSSA)
       MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
@@ -1089,6 +1091,16 @@ bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
   if (!Preheader) return false;
 
   bool MadeAnyChanges = false;
+
+  // Collect all store instructions that may modify memory in the loop.
+  SmallPtrSet<Instruction *, 8> Stores;
+  for (BasicBlock *BB : L->blocks()) {
+    for (Instruction &I : *BB) {
+      if (I.mayWriteToMemory())
+        Stores.insert(&I);
+    }
+  }
+
   for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
 
     // Skip BB Terminator.
@@ -1100,14 +1112,32 @@ bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
       break;
 
     // Don't move instructions which might have side effects, since the side
-    // effects need to complete before instructions inside the loop.  Also don't
-    // move instructions which might read memory, since the loop may modify
-    // memory. Note that it's okay if the instruction might have undefined
-    // behavior: LoopSimplify guarantees that the preheader dominates the exit
-    // block.
-    if (I.mayHaveSideEffects() || I.mayReadFromMemory())
+    // effects need to complete before instructions inside the loop. Note that
+    // it's okay if the instruction might have undefined behavior: LoopSimplify
+    // guarantees that the preheader dominates the exit block.
+    if (I.mayHaveSideEffects())
       continue;
 
+    // Don't sink read instruction which's memory might be modified in the loop.
+    if (I.mayReadFromMemory()) {
+      if (LoadInst *Load = dyn_cast<LoadInst>(&I)) {
+        MemoryLocation Loc = MemoryLocation::get(Load);
+        bool isModified = false;
+
+        // Check if any store instruction in the loop modifies the loaded memory
+        // location.
+        for (Instruction *S : Stores) {
+          if (isModSet(AA->getModRefInfo(S, Loc))) {
+            isModified = true;
+            break;
+          }
+        }
+        if (isModified)
+          continue;
+      } else {
+        continue;
+      }
+    }
     // Skip debug or pseudo instructions.
     if (I.isDebugOrPseudoInst())
       continue;
@@ -2042,8 +2072,8 @@ PreservedAnalyses IndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,
   Function *F = L.getHeader()->getParent();
   const DataLayout &DL = F->getDataLayout();
 
-  IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, AR.MSSA,
-                     WidenIndVars && AllowIVWidening);
+  IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, &AR.AA,
+                     AR.MSSA, WidenIndVars && AllowIVWidening);
   if (!IVS.run(&L))
     return PreservedAnalyses::all();
 
diff --git a/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll b/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
index 89583f9131518..8d714c775bde5 100644
--- a/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
+++ b/llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll
@@ -30,3 +30,64 @@ loop:
 exit:
   ret i32 %add
 }
+
+define i32 @test_with_unused_load(i32 %a, ptr %b, i32 %N) {
+; CHECK-LABEL: @test_with_unused_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[A:%.*]], [[LOAD]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+entry:
+  %load = load i32, ptr %b
+  %add = add i32 %a, %load
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv.next, %N
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %add
+}
+
+; load's memory location is modified inside the loop, don't sink load.
+define i32 @test_with_unused_load_modified_store(i32 %a, ptr %b, i32 %N) {
+; CHECK-LABEL: @test_with_unused_load_modified_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    store i32 [[IV_NEXT]], ptr [[B]], align 4
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[A:%.*]], [[LOAD]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+entry:
+  %load = load i32, ptr %b
+  %add = add i32 %a, %load
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %iv.next = add i32 %iv, 1
+  store i32 %iv.next, ptr %b
+  %cmp = icmp slt i32 %iv.next, %N
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %add
+}
diff --git a/llvm/test/Transforms/LoopSimplify/ashr-crash.ll b/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
index 85198eb19aad6..ae4f9adf8e16c 100644
--- a/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
+++ b/llvm/test/Transforms/LoopSimplify/ashr-crash.ll
@@ -29,11 +29,11 @@ define void @foo() {
 ; CHECK-LABEL: define void @foo() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store i32 0, ptr @d, align 4
-; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr @c, align 4
 ; CHECK-NEXT:    br label [[FOR_COND1_PREHEADER:%.*]]
 ; CHECK:       for.cond1.preheader:
 ; CHECK-NEXT:    br label [[FOR_BODY3:%.*]]
 ; CHECK:       for.body3:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr @c, align 4
 ; CHECK-NEXT:    store i32 1, ptr @a, align 4
 ; CHECK-NEXT:    store i32 1, ptr @d, align 4
 ; CHECK-NEXT:    [[CMP4_LE_LE_INV:%.*]] = icmp sgt i32 [[TMP0]], 0

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:openmp OpenMP related changes to Clang labels Sep 8, 2025
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)

It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

}
if (isModified)
continue;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment for clarity.

exit:
ret i32 %add
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add at least one (two is better) test with noalias & alias_info metadata for loads and stores; and show whether load will sink or not depending on metadata.

I would also write these tests on a separate .ll file.

@VigneshwarJ
Copy link
Contributor Author

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)

It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

I agree that the whole logic of sinkUnusedInvariants can be removed from here and moved. I will try doing that in LICM, but LICM also hoists/sinks instructions from loop to preheader/exit block. So I have to add it at the end. I want to know if LoopSink.cpp would be a better place to move the logic to (it also has access to MemorySSA). It already sinks from preheader back to loop blocks. We could do the sinking to exit block there as well.
indvars is called before loop unrolling. So right now, the sinking happens before unrolling. Loopsink pass is called at much later stage.
The main reason I started looking at sinking loads is when the other invariant instructions that uses the load are sunk and the loop is unrolled, this increased the live range of load drastically.

@srpande
Copy link
Contributor

srpande commented Sep 15, 2025

I don't think IndVarSimplify is a good place to perform this transform. We could probably do this in LICM, which has access to MemorySSA. (LICM already sinks instructions from the loop to the exits, so doing it fore the preheader would be a somewhat natural extension of the scope of the pass.)
It looks like the current patch miscompiles clang (stage2 build crashes on llvm-test-suite).

I agree that the whole logic of sinkUnusedInvariants can be removed from here and moved. I will try doing that in LICM, but LICM also hoists/sinks instructions from loop to preheader/exit block. So I have to add it at the end. I want to know if LoopSink.cpp would be a better place to move the logic to (it also has access to MemorySSA). It already sinks from preheader back to loop blocks. We could do the sinking to exit block there as well. indvars is called before loop unrolling. So right now, the sinking happens before unrolling. Loopsink pass is called at much later stage. The main reason I started looking at sinking loads is when the other invariant instructions that uses the load are sunk and the loop is unrolled, this increased the live range of load drastically.

LoopSink might be a better place, as the original intent of this code is to reduce the register pressure, and I believe moving it to LoopSink would solve that purpose. Having said that, it might have some (a few) performance impacts on the applications out there.

@nikic
Copy link
Contributor

nikic commented Sep 15, 2025

LoopSink is a PGO pass. Assuming you don't want to drive this with profiling information, it's probably not a good fit for that reason.

@VigneshwarJ VigneshwarJ changed the title [IndVarSimplify] Sink unused l-invariant loads in preheader. [LICM] Sink unused l-invariant loads in preheader. Sep 24, 2025
if (auto *PN = dyn_cast<PHINode>(UserI)) {
unsigned OpIdx =
PHINode::getIncomingValueNumForOperand(U.getOperandNo());
UseBB = PN->getIncomingBlock(OpIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly pass U, no need to call getIncomingValueNumForOperand().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @VigneshwarJ is to do that, then he should have two patches;
first, for cleaning up what is already here like what you are mentioning.
second, move this sub pass to an appropriate location.

// sink pre-header defs that are unused in-loop into the unique exit to reduce
// pressure.
Changed |=
sinkUnusedInvariantsFromPreheaderToExit(L, AA, &SafetyInfo, MSSAU, SE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this above forgetLoopDispositions() and the verification calls?

@jayfoad
Copy link
Contributor

jayfoad commented Sep 25, 2025

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

What live range? If the result is unused, how can it contribute to register pressure in the loop body?

@nikic
Copy link
Contributor

nikic commented Sep 25, 2025

Unused loop invariant loads were not sunk from preheader to exit block, increasing live range.

What live range? If the result is unused, how can it contribute to register pressure in the loop body?

I believe "unused" here refers to "unused inside the loop, but used after the loop".

@nikic
Copy link
Contributor

nikic commented Sep 25, 2025

Crash reproducer:

; RUN: opt -S -passes="loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch)" < %s

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define void @_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined(ptr %0) {
  %2 = alloca i32, align 4
  call void @llvm.lifetime.start.p0(ptr %2)
  br label %3

3:                                                ; preds = %5, %1
  %4 = load i32, ptr null, align 4
  br label %.preheader

.preheader:                                       ; preds = %7, %3
  %.049 = phi i32 [ %8, %7 ], [ 0, %3 ]
  br i1 false, label %7, label %5

5:                                                ; preds = %.preheader
  %6 = load ptr, ptr %0, align 8
  store float 0.000000e+00, ptr %6, align 4
  br label %3

7:                                                ; preds = %.preheader
  %8 = add i32 0, 0
  br label %.preheader
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(ptr captures(none)) #0

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
opt: /home/npopov/repos/llvm-project/llvm/include/llvm/Support/GenericDomTree.h:403: DomTreeNodeBase<NodeT> *llvm::DominatorTreeBase<llvm::BasicBlock, false>::getNode(const NodeT *) const [NodeT = llvm::BasicBlock, IsPostDom = false]: Assertion `(!BB || Parent == NodeTrait::getParent(const_cast<NodeT *>(BB))) && "cannot get DomTreeNode of block with different parent"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.	Program arguments: build/bin/opt -S -passes=loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch) test.ll
1.	Running pass "function(loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;no-prepare-for-lto>,simple-loop-unswitch<no-nontrivial;trivial>))" on module "test.ll"
2.	Running pass "loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;no-prepare-for-lto>,simple-loop-unswitch<no-nontrivial;trivial>)" on function "_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined"
 #0 0x0000000004a159f8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build/bin/opt+0x4a159f8)
 #1 0x0000000004a12fc5 llvm::sys::RunSignalHandlers() (build/bin/opt+0x4a12fc5)
 #2 0x0000000004a16a91 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f149ea28bb0 __restore_rt (/lib64/libc.so.6+0x19bb0)
 #4 0x00007f149ea8209c __pthread_kill_implementation (/lib64/libc.so.6+0x7309c)
 #5 0x00007f149ea28a7e gsignal (/lib64/libc.so.6+0x19a7e)
 #6 0x00007f149ea106d0 abort (/lib64/libc.so.6+0x16d0)
 #7 0x00007f149ea10639 __assert_perror_fail (/lib64/libc.so.6+0x1639)
 #8 0x000000000501d649 llvm::DominatorTreeBase<llvm::BasicBlock, false>::dominates(llvm::BasicBlock const*, llvm::BasicBlock const*) const (build/bin/opt+0x501d649)
 #9 0x00000000052b1208 (anonymous namespace)::ClobberWalker::findClobber(llvm::BatchAAResults&, llvm::MemoryAccess*, (anonymous namespace)::UpwardsMemoryQuery&, unsigned int&) MemorySSA.cpp:0:0
#10 0x00000000052b2887 llvm::MemorySSA::ClobberWalkerBase::getClobberingMemoryAccessBase(llvm::MemoryAccess*, llvm::BatchAAResults&, unsigned int&, bool, bool) (build/bin/opt+0x52b2887)
#11 0x00000000052b6c05 llvm::MemorySSA::SkipSelfWalker::getClobberingMemoryAccess(llvm::MemoryAccess*, llvm::BatchAAResults&) MemorySSA.cpp:0:0
#12 0x00000000053f55a8 pointerInvalidatedByLoop(llvm::MemorySSA*, llvm::MemoryUse*, llvm::Loop*, llvm::Instruction&, llvm::SinkAndHoistLICMFlags&, bool) LICM.cpp:0:0
#13 0x00000000053ee957 llvm::canSinkOrHoistInst(llvm::Instruction&, llvm::AAResults*, llvm::DominatorTree*, llvm::Loop*, llvm::MemorySSAUpdater&, bool, llvm::SinkAndHoistLICMFlags&, llvm::OptimizationRemarkEmitter*) (build/bin/opt+0x53ee957)

@VigneshwarJ
Copy link
Contributor Author

VigneshwarJ commented Sep 25, 2025

; RUN: opt -S -passes="loop-mssa(loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,simple-loop-unswitch)" < %s

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define void @_ZN4ncnnL8rnn_int8ERKNS_3MatERS0_iS2_PKfS2_S2_S5_S3_RKNS_6OptionE.omp_outlined(ptr %0) {
  %2 = alloca i32, align 4
  call void @llvm.lifetime.start.p0(ptr %2)
  br label %3

3:                                                ; preds = %5, %1
  %4 = load i32, ptr null, align 4
  br label %.preheader

.preheader:                                       ; preds = %7, %3
  %.049 = phi i32 [ %8, %7 ], [ 0, %3 ]
  br i1 false, label %7, label %5

5:                                                ; preds = %.preheader
  %6 = load ptr, ptr %0, align 8
  store float 0.000000e+00, ptr %6, align 4
  br label %3

7:                                                ; preds = %.preheader
  %8 = add i32 0, 0
  br label %.preheader
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(ptr captures(none)) #0

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

(edit) I see the crash now once I updated with main branch

@VigneshwarJ
Copy link
Contributor Author

@nikic , That was due to a bad MSSA update, fixed that issue.

@VigneshwarJ VigneshwarJ requested a review from nikic September 26, 2025 20:42
@VigneshwarJ
Copy link
Contributor Author

Ping for review

@VigneshwarJ
Copy link
Contributor Author

@nikic ping for review

Copy link
Contributor

@srpande srpande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpick. Otherwise LGTM

@nikic
Copy link
Contributor

nikic commented Oct 27, 2025

@zyw-bot csmith-fuzz

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Worth noting that this kind of sinking can also increase register pressure in the loop. E.g. if you sink an instruction with two operands, and the operand instructions cannot be sunk. Then you'll need two registers instead of one in the loop.

Though from what I can tell, this does seem to be beneficial on average.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=licm -verify-memoryssa -S | FileCheck %s
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
target triple = "i386-apple-darwin10.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit the triple if it's not needed. Otherwise the test may need REQUIRES.

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
target triple = "i386-apple-darwin10.0"

; We make sinking here, Changed flag should be set properly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; We make sinking here, Changed flag should be set properly.
; We perform sinking here, Changed flag should be set properly.

@VigneshwarJ VigneshwarJ merged commit 469702c into llvm:main Oct 30, 2025
10 checks passed
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
luciechoi pushed a commit to luciechoi/llvm-project that referenced this pull request Nov 1, 2025
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
VigneshwarJ added a commit to ROCm/llvm-project that referenced this pull request Nov 5, 2025
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
@yzhang93
Copy link
Contributor

yzhang93 commented Nov 6, 2025

Hi @VigneshwarJ, I'm working on the downstream project and found the GPU performance for some GEMMs were regressed after this PR. I've added some useful information here #166671. Could you help take a look?

@VigneshwarJ
Copy link
Contributor Author

VigneshwarJ commented Nov 6, 2025

@yzhang93 sure, I will look at the issue.

zGoldthorpe added a commit to ROCm/llvm-project that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU backend:PowerPC clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants