Skip to content

Conversation

@VigneshwarJ
Copy link
Contributor

@VigneshwarJ VigneshwarJ commented Dec 1, 2025

Fix stack overflow in MemorySSA update during sinking and fixes Scalar evolution update.
Manually resolve the defining access from the preheader when sinking unused invariants to the exit block. This avoids expensive and potentially deep recursive calls in MemorySSAUpdater::createMemoryAccessInBB
that can lead to stack overflows on systems with limited stack size.

This fixes the issue where stack overflow happened when the stack size is small for large loops due to getPreviousDefRecursive getting called for updating MSSAU
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-powerpc

Author: Vigneshwar Jayakumar (VigneshwarJ)

Changes

This reland fixes the issue where stack overflow occurred when the stack size was small for large loops, due to getPreviousDefRecursive being called to update MSSAU, and fixes the improper SE update.


Patch is 81.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170204.diff

36 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (-85)
  • (modified) llvm/lib/Transforms/Scalar/LICM.cpp (+123)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll (-2)
  • (modified) llvm/test/CodeGen/PowerPC/combine-sext-and-shl-after-isel.ll (+42-58)
  • (modified) llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll (+10-12)
  • (modified) llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll (+2-2)
  • (modified) llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/exit-count-select.ll (+7-7)
  • (modified) llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll (+3-3)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+4-4)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr24783.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr39673.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr63763.ll (+3-3)
  • (modified) llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll (+10-11)
  • (modified) llvm/test/Transforms/IndVarSimplify/rewrite-loop-exit-values-phi.ll (+4-4)
  • (modified) llvm/test/Transforms/IndVarSimplify/scev-expander-preserve-lcssa.ll (+7-7)
  • (modified) llvm/test/Transforms/IndVarSimplify/scev-invalidation.ll (+2-2)
  • (modified) llvm/test/Transforms/IndVarSimplify/sentinel.ll (+7-7)
  • (removed) llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll (-32)
  • (removed) llvm/test/Transforms/IndVarSimplify/sink-trapping.ll (-19)
  • (modified) llvm/test/Transforms/IndVarSimplify/zext-nuw.ll (+1-1)
  • (modified) llvm/test/Transforms/LICM/scalar-promote.ll (+3-3)
  • (renamed) llvm/test/Transforms/LICM/sink-alloca.ll (+3-3)
  • (added) llvm/test/Transforms/LICM/sink-from-preheader.ll (+185)
  • (added) llvm/test/Transforms/LICM/sink-trapping.ll (+28)
  • (modified) llvm/test/Transforms/LoopDeletion/invalidate-scev-after-hoisting.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopDistribute/laa-invalidation.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll (+2-2)
  • (modified) llvm/test/Transforms/PhaseOrdering/AArch64/std-find.ll (+5-4)
  • (modified) llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll (+10-10)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/pr48844-br-to-switch-vectorization.ll (+3-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll (+25-24)
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index b46527eb1057b..eab1d4975ac96 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -162,8 +162,6 @@ class IndVarSimplify {
                                  const SCEV *ExitCount,
                                  PHINode *IndVar, SCEVExpander &Rewriter);
 
-  bool sinkUnusedInvariants(Loop *L);
-
 public:
   IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
                  const DataLayout &DL, TargetLibraryInfo *TLI,
@@ -1093,85 +1091,6 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB,
   return true;
 }
 
-//===----------------------------------------------------------------------===//
-//  sinkUnusedInvariants. A late subpass to cleanup loop preheaders.
-//===----------------------------------------------------------------------===//
-
-/// If there's a single exit block, sink any loop-invariant values that
-/// were defined in the preheader but not used inside the loop into the
-/// exit block to reduce register pressure in the loop.
-bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
-  BasicBlock *ExitBlock = L->getExitBlock();
-  if (!ExitBlock) return false;
-
-  BasicBlock *Preheader = L->getLoopPreheader();
-  if (!Preheader) return false;
-
-  bool MadeAnyChanges = false;
-  for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
-
-    // Skip BB Terminator.
-    if (Preheader->getTerminator() == &I)
-      continue;
-
-    // New instructions were inserted at the end of the preheader.
-    if (isa<PHINode>(I))
-      break;
-
-    // Don't move instructions which might have side effects, since the side
-    // effects need to complete before instructions inside the loop.  Also don't
-    // move instructions which might read memory, since the loop may modify
-    // memory. Note that it's okay if the instruction might have undefined
-    // behavior: LoopSimplify guarantees that the preheader dominates the exit
-    // block.
-    if (I.mayHaveSideEffects() || I.mayReadFromMemory())
-      continue;
-
-    // Skip debug or pseudo instructions.
-    if (I.isDebugOrPseudoInst())
-      continue;
-
-    // Skip eh pad instructions.
-    if (I.isEHPad())
-      continue;
-
-    // Don't sink alloca: we never want to sink static alloca's out of the
-    // entry block, and correctly sinking dynamic alloca's requires
-    // checks for stacksave/stackrestore intrinsics.
-    // FIXME: Refactor this check somehow?
-    if (isa<AllocaInst>(&I))
-      continue;
-
-    // Determine if there is a use in or before the loop (direct or
-    // otherwise).
-    bool UsedInLoop = false;
-    for (Use &U : I.uses()) {
-      Instruction *User = cast<Instruction>(U.getUser());
-      BasicBlock *UseBB = User->getParent();
-      if (PHINode *P = dyn_cast<PHINode>(User)) {
-        unsigned i =
-          PHINode::getIncomingValueNumForOperand(U.getOperandNo());
-        UseBB = P->getIncomingBlock(i);
-      }
-      if (UseBB == Preheader || L->contains(UseBB)) {
-        UsedInLoop = true;
-        break;
-      }
-    }
-
-    // If there is, the def must remain in the preheader.
-    if (UsedInLoop)
-      continue;
-
-    // Otherwise, sink it to the exit block.
-    I.moveBefore(ExitBlock->getFirstInsertionPt());
-    SE->forgetValue(&I);
-    MadeAnyChanges = true;
-  }
-
-  return MadeAnyChanges;
-}
-
 static void replaceExitCond(BranchInst *BI, Value *NewCond,
                             SmallVectorImpl<WeakTrackingVH> &DeadInsts) {
   auto *OldCond = BI->getCondition();
@@ -2079,10 +1998,6 @@ bool IndVarSimplify::run(Loop *L) {
 
   // The Rewriter may not be used from this point on.
 
-  // Loop-invariant instructions in the preheader that aren't used in the
-  // loop may be sunk below the loop to reduce register pressure.
-  Changed |= sinkUnusedInvariants(L);
-
   // rewriteFirstIterationLoopExitValues does not rely on the computation of
   // trip count and therefore can further simplify exit values in addition to
   // rewriteLoopExitValues.
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index b2c526b41502b..02db4a06cd669 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -215,6 +215,11 @@ static void moveInstructionBefore(Instruction &I, BasicBlock::iterator Dest,
                                   ICFLoopSafetyInfo &SafetyInfo,
                                   MemorySSAUpdater &MSSAU, ScalarEvolution *SE);
 
+static bool sinkUnusedInvariantsFromPreheaderToExit(
+    Loop *L, AAResults *AA, ICFLoopSafetyInfo *SafetyInfo,
+    MemorySSAUpdater &MSSAU, ScalarEvolution *SE, DominatorTree *DT,
+    SinkAndHoistLICMFlags &SinkFlags, OptimizationRemarkEmitter *ORE);
+
 static void foreachMemoryAccess(MemorySSA *MSSA, Loop *L,
                                 function_ref<void(Instruction *)> Fn);
 using PointersAndHasReadsOutsideSet =
@@ -471,6 +476,12 @@ bool LoopInvariantCodeMotion::runOnLoop(Loop *L, AAResults *AA, LoopInfo *LI,
                                     TLI, TTI, L, MSSAU, &SafetyInfo, Flags, ORE)
             : sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, TTI, L,
                          MSSAU, &SafetyInfo, Flags, ORE);
+
+  // sink pre-header defs that are unused in-loop into the unique exit to reduce
+  // pressure.
+  Changed |= sinkUnusedInvariantsFromPreheaderToExit(L, AA, &SafetyInfo, MSSAU,
+                                                     SE, DT, Flags, ORE);
+
   Flags.setIsSink(false);
   if (Preheader)
     Changed |= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, AC, TLI, L,
@@ -1469,6 +1480,118 @@ static void moveInstructionBefore(Instruction &I, BasicBlock::iterator Dest,
     SE->forgetBlockAndLoopDispositions(&I);
 }
 
+// If there's a single exit block, sink any loop-invariant values that were
+// defined in the preheader but not used inside the loop into the exit block
+// to reduce register pressure in the loop.
+static bool sinkUnusedInvariantsFromPreheaderToExit(
+    Loop *L, AAResults *AA, ICFLoopSafetyInfo *SafetyInfo,
+    MemorySSAUpdater &MSSAU, ScalarEvolution *SE, DominatorTree *DT,
+    SinkAndHoistLICMFlags &SinkFlags, OptimizationRemarkEmitter *ORE) {
+  BasicBlock *ExitBlock = L->getExitBlock();
+  if (!ExitBlock)
+    return false;
+
+  BasicBlock *Preheader = L->getLoopPreheader();
+  if (!Preheader)
+    return false;
+
+  bool MadeAnyChanges = false;
+  MemoryAccess *ExitDef = nullptr;
+
+  for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
+
+    // Skip terminator.
+    if (Preheader->getTerminator() == &I)
+      continue;
+
+    // New instructions were inserted at the end of the preheader.
+    if (isa<PHINode>(I))
+      break;
+
+    // Don't move instructions which might have side effects, since the side
+    // effects need to complete before instructions inside the loop. Note that
+    // it's okay if the instruction might have undefined behavior: LoopSimplify
+    // guarantees that the preheader dominates the exit block.
+    if (I.mayHaveSideEffects())
+      continue;
+
+    if (!canSinkOrHoistInst(I, AA, DT, L, MSSAU, true, SinkFlags, nullptr))
+      continue;
+
+    // Determine if there is a use in or before the loop (direct or
+    // otherwise).
+    bool UsedInLoopOrPreheader = false;
+    for (Use &U : I.uses()) {
+      auto *UserI = cast<Instruction>(U.getUser());
+      BasicBlock *UseBB = UserI->getParent();
+      if (auto *PN = dyn_cast<PHINode>(UserI)) {
+        UseBB = PN->getIncomingBlock(U);
+      }
+      if (UseBB == Preheader || L->contains(UseBB)) {
+        UsedInLoopOrPreheader = true;
+        break;
+      }
+    }
+    if (UsedInLoopOrPreheader)
+      continue;
+
+    // Move the instruction.
+    SafetyInfo->removeInstruction(&I);
+    SafetyInfo->insertInstructionTo(&I, ExitBlock);
+    I.moveBefore(*ExitBlock, ExitBlock->getFirstInsertionPt());
+    if (SE)
+      SE->forgetValue(&I);
+
+    // Update MemorySSA.
+    if (auto *OldMA = MSSAU.getMemorySSA()->getMemoryAccess(&I)) {
+      // apviding the expensive getPreviousDefRecursive call by manually
+      // setting the defining access.
+      if (!ExitDef) {
+        if (auto *MPhi = MSSAU.getMemorySSA()->getMemoryAccess(ExitBlock)) {
+          ExitDef = MPhi;
+        } else {
+          BasicBlock *Current = *predecessors(ExitBlock).begin();
+          while (true) {
+            if (auto *Accesses =
+                    MSSAU.getMemorySSA()->getBlockAccesses(Current)) {
+              if (!Accesses->empty()) {
+                MemoryAccess *Back =
+                    const_cast<MemoryAccess *>(&Accesses->back());
+                if (isa<MemoryDef>(Back) || isa<MemoryPhi>(Back))
+                  ExitDef = Back;
+                else
+                  ExitDef = MSSAU.getMemorySSA()
+                                ->getWalker()
+                                ->getClobberingMemoryAccess(Back);
+                break;
+              }
+            }
+
+            if (Current == L->getHeader()) {
+              Current = Preheader;
+              continue;
+            }
+
+            if (pred_empty(Current)) {
+              ExitDef = MSSAU.getMemorySSA()->getLiveOnEntryDef();
+              break;
+            }
+            Current = *pred_begin(Current);
+          }
+        }
+      }
+      MemoryAccess *NewMA = MSSAU.createMemoryAccessInBB(&I, ExitDef, ExitBlock,
+                                                         MemorySSA::Beginning);
+      OldMA->replaceAllUsesWith(NewMA);
+      MSSAU.removeMemoryAccess(OldMA);
+    }
+
+    MadeAnyChanges = true;
+  }
+
+  return MadeAnyChanges;
+}
+
 static Instruction *sinkThroughTriviallyReplaceablePHI(
     PHINode *TPN, Instruction *I, LoopInfo *LI,
     SmallDenseMap<BasicBlock *, Instruction *, 32> &SunkCopies,
diff --git a/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll b/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll
index 71981e3599b87..1c734fa05cdcf 100644
--- a/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll
+++ b/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll
@@ -73,8 +73,8 @@ define amdgpu_kernel void @constant_zextload_v64i16_to_v64i32(ptr addrspace(1) %
 }
 
 ; CHECK-LABEL: {{^}}excess_soft_clause_reg_pressure:
-; GFX908:    NumSgprs: 64
-; GFX908-GCNTRACKERS:    NumSgprs: 64
+; GFX908:    NumSgprs: 56
+; GFX908-GCNTRACKERS:    NumSgprs: 56
 ; GFX908:    NumVgprs: 41
 ; GFX908-GCNTRACKERS:    NumVgprs: 39
 ; GFX908:    Occupancy: 5
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll b/llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
index db49339ea1f78..9c16b3c8a3f86 100644
--- a/llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
@@ -22,8 +22,6 @@
 ; GFX9-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe00000
 
 ; OFFREG is offset system SGPR
-; GCN: buffer_store_dword {{v[0-9]+}}, off, s[[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Spill
-; GCN: buffer_load_dword v{{[0-9]+}}, off, s[[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Reload
 ; GCN: NumVgprs: 256
 ; GCN: ScratchSize: 640
 
diff --git a/llvm/test/CodeGen/PowerPC/combine-sext-and-shl-after-isel.ll b/llvm/test/CodeGen/PowerPC/combine-sext-and-shl-after-isel.ll
index 00a77f92c0413..530169ff09486 100644
--- a/llvm/test/CodeGen/PowerPC/combine-sext-and-shl-after-isel.ll
+++ b/llvm/test/CodeGen/PowerPC/combine-sext-and-shl-after-isel.ll
@@ -212,37 +212,33 @@ define hidden void @testCaller(i1 %incond) local_unnamed_addr align 2 nounwind {
 ; CHECK-NEXT:    std r30, 48(r1) # 8-byte Folded Spill
 ; CHECK-NEXT:    andi. r3, r3, 1
 ; CHECK-NEXT:    li r3, -1
+; CHECK-NEXT:    li r4, 0
 ; CHECK-NEXT:    li r30, 0
 ; CHECK-NEXT:    crmove 4*cr2+lt, gt
 ; CHECK-NEXT:    std r29, 40(r1) # 8-byte Folded Spill
 ; CHECK-NEXT:    b .LBB3_2
-; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB3_1: # %if.end116
 ; CHECK-NEXT:    #
 ; CHECK-NEXT:    bl callee
 ; CHECK-NEXT:    nop
 ; CHECK-NEXT:    mr r3, r29
-; CHECK-NEXT:  .LBB3_2: # %cond.end.i.i
-; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB3_3 Depth 2
-; CHECK-NEXT:    lwz r29, 0(r3)
-; CHECK-NEXT:    li r5, 0
-; CHECK-NEXT:    extsw r4, r29
-; CHECK-NEXT:    .p2align 5
-; CHECK-NEXT:  .LBB3_3: # %while.body5.i
-; CHECK-NEXT:    # Parent Loop BB3_2 Depth=1
-; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    addi r5, r5, -1
-; CHECK-NEXT:    cmpwi r5, 0
-; CHECK-NEXT:    bgt cr0, .LBB3_3
-; CHECK-NEXT:  # %bb.4: # %while.cond12.preheader.i
+; CHECK-NEXT:    li r4, 0
+; CHECK-NEXT:    .p2align 4
+; CHECK-NEXT:  .LBB3_2: # %while.body5.i
 ; CHECK-NEXT:    #
+; CHECK-NEXT:    addi r4, r4, -1
+; CHECK-NEXT:    cmpwi r4, 0
+; CHECK-NEXT:    bgt cr0, .LBB3_2
+; CHECK-NEXT:  # %bb.3: # %while.cond12.preheader.i
+; CHECK-NEXT:    #
+; CHECK-NEXT:    lwz r29, 0(r3)
 ; CHECK-NEXT:    bc 12, 4*cr2+lt, .LBB3_1
-; CHECK-NEXT:  # %bb.5: # %for.cond99.preheader
+; CHECK-NEXT:  # %bb.4: # %for.cond99.preheader
 ; CHECK-NEXT:    #
+; CHECK-NEXT:    extsw r4, r29
 ; CHECK-NEXT:    ld r5, 0(r3)
-; CHECK-NEXT:    sldi r4, r4, 2
 ; CHECK-NEXT:    stw r3, 0(r3)
+; CHECK-NEXT:    sldi r4, r4, 2
 ; CHECK-NEXT:    stwx r30, r5, r4
 ; CHECK-NEXT:    b .LBB3_1
 ;
@@ -256,37 +252,33 @@ define hidden void @testCaller(i1 %incond) local_unnamed_addr align 2 nounwind {
 ; CHECK-BE-NEXT:    std r30, 64(r1) # 8-byte Folded Spill
 ; CHECK-BE-NEXT:    andi. r3, r3, 1
 ; CHECK-BE-NEXT:    li r3, -1
+; CHECK-BE-NEXT:    li r4, 0
 ; CHECK-BE-NEXT:    li r30, 0
 ; CHECK-BE-NEXT:    crmove 4*cr2+lt, gt
 ; CHECK-BE-NEXT:    std r29, 56(r1) # 8-byte Folded Spill
 ; CHECK-BE-NEXT:    b .LBB3_2
-; CHECK-BE-NEXT:    .p2align 4
 ; CHECK-BE-NEXT:  .LBB3_1: # %if.end116
 ; CHECK-BE-NEXT:    #
 ; CHECK-BE-NEXT:    bl callee
 ; CHECK-BE-NEXT:    nop
 ; CHECK-BE-NEXT:    mr r3, r29
-; CHECK-BE-NEXT:  .LBB3_2: # %cond.end.i.i
-; CHECK-BE-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-BE-NEXT:    # Child Loop BB3_3 Depth 2
-; CHECK-BE-NEXT:    lwz r29, 0(r3)
-; CHECK-BE-NEXT:    li r5, 0
-; CHECK-BE-NEXT:    extsw r4, r29
-; CHECK-BE-NEXT:    .p2align 5
-; CHECK-BE-NEXT:  .LBB3_3: # %while.body5.i
-; CHECK-BE-NEXT:    # Parent Loop BB3_2 Depth=1
-; CHECK-BE-NEXT:    # => This Inner Loop Header: Depth=2
-; CHECK-BE-NEXT:    addi r5, r5, -1
-; CHECK-BE-NEXT:    cmpwi r5, 0
-; CHECK-BE-NEXT:    bgt cr0, .LBB3_3
-; CHECK-BE-NEXT:  # %bb.4: # %while.cond12.preheader.i
+; CHECK-BE-NEXT:    li r4, 0
+; CHECK-BE-NEXT:    .p2align 4
+; CHECK-BE-NEXT:  .LBB3_2: # %while.body5.i
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    addi r4, r4, -1
+; CHECK-BE-NEXT:    cmpwi r4, 0
+; CHECK-BE-NEXT:    bgt cr0, .LBB3_2
+; CHECK-BE-NEXT:  # %bb.3: # %while.cond12.preheader.i
 ; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    lwz r29, 0(r3)
 ; CHECK-BE-NEXT:    bc 12, 4*cr2+lt, .LBB3_1
-; CHECK-BE-NEXT:  # %bb.5: # %for.cond99.preheader
+; CHECK-BE-NEXT:  # %bb.4: # %for.cond99.preheader
 ; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    extsw r4, r29
 ; CHECK-BE-NEXT:    ld r5, 0(r3)
-; CHECK-BE-NEXT:    sldi r4, r4, 2
 ; CHECK-BE-NEXT:    stw r3, 0(r3)
+; CHECK-BE-NEXT:    sldi r4, r4, 2
 ; CHECK-BE-NEXT:    stwx r30, r5, r4
 ; CHECK-BE-NEXT:    b .LBB3_1
 ;
@@ -300,32 +292,28 @@ define hidden void @testCaller(i1 %incond) local_unnamed_addr align 2 nounwind {
 ; CHECK-P9-NEXT:    std r0, 80(r1)
 ; CHECK-P9-NEXT:    std r30, 48(r1) # 8-byte Folded Spill
 ; CHECK-P9-NEXT:    li r3, -1
+; CHECK-P9-NEXT:    li r4, 0
 ; CHECK-P9-NEXT:    li r30, 0
 ; CHECK-P9-NEXT:    std r29, 40(r1) # 8-byte Folded Spill
 ; CHECK-P9-NEXT:    crmove 4*cr2+lt, gt
 ; CHECK-P9-NEXT:    b .LBB3_2
-; CHECK-P9-NEXT:    .p2align 4
 ; CHECK-P9-NEXT:  .LBB3_1: # %if.end116
 ; CHECK-P9-NEXT:    #
 ; CHECK-P9-NEXT:    bl callee
 ; CHECK-P9-NEXT:    nop
 ; CHECK-P9-NEXT:    mr r3, r29
-; CHECK-P9-NEXT:  .LBB3_2: # %cond.end.i.i
-; CHECK-P9-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-P9-NEXT:    # Child Loop BB3_3 Depth 2
-; CHECK-P9-NEXT:    lwz r29, 0(r3)
 ; CHECK-P9-NEXT:    li r4, 0
-; CHECK-P9-NEXT:    .p2align 5
-; CHECK-P9-NEXT:  .LBB3_3: # %while.body5.i
-; CHECK-P9-NEXT:    # Parent Loop BB3_2 Depth=1
-; CHECK-P9-NEXT:    # => This Inner Loop Header: Depth=2
+; CHECK-P9-NEXT:    .p2align 4
+; CHECK-P9-NEXT:  .LBB3_2: # %while.body5.i
+; CHECK-P9-NEXT:    #
 ; CHECK-P9-NEXT:    addi r4, r4, -1
 ; CHECK-P9-NEXT:    cmpwi r4, 0
-; CHECK-P9-NEXT:    bgt cr0, .LBB3_3
-; CHECK-P9-NEXT:  # %bb.4: # %while.cond12.preheader.i
+; CHECK-P9-NEXT:    bgt cr0, .LBB3_2
+; CHECK-P9-NEXT:  # %bb.3: # %while.cond12.preheader.i
 ; CHECK-P9-NEXT:    #
+; CHECK-P9-NEXT:    lwz r29, 0(r3)
 ; CHECK-P9-NEXT:    bc 12, 4*cr2+lt, .LBB3_1
-; CHECK-P9-NEXT:  # %bb.5: # %for.cond99.preheader
+; CHECK-P9-NEXT:  # %bb.4: # %for.cond99.preheader
 ; CHECK-P9-NEXT:    #
 ; CHECK-P9-NEXT:    ld r4, 0(r3)
 ; CHECK-P9-NEXT:    extswsli r5, r29, 2
@@ -343,32 +331,28 @@ define hidden void @testCaller(i1 %incond) local_unnamed_addr align 2 nounwind {
 ; CHECK-P9-BE-NEXT:    std r0, 96(r1)
 ; CHECK-P9-BE-NEXT:    std r30, 64(r1) # 8-byte Folded Spill
 ; CHECK-P9-BE-NEXT:    li r3, -1
+; CHECK-P9-BE-NEXT:    li r4, 0
 ; CHECK-P9-BE-NEXT:    li r30, 0
 ; CHECK-P9-BE-NEXT:    std r29, 56(r1) # 8-byte Folded Spill
 ; CHECK-P9-BE-NEXT:    crmove 4*cr2+lt, gt
 ; CHECK-P9-BE-NEXT:    b .LBB3_2
-; CHECK-P9-BE-NEXT:    .p2align 4
 ; CHECK-P9-BE-NEXT:  .LBB3_1: # %if.end116
 ; CHECK-P9-BE-NEXT:    #
 ; CHECK-P9-BE-NEXT:    bl callee
 ; CHECK-P9-BE-NEXT:    nop
 ; CHECK-P9-BE-NEXT:    mr r3, r29
-; CHECK-P9-BE-NEXT:  .LBB3_2: # %cond.end.i.i
-; CHECK-P9-BE-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-P9-BE-NEXT:    # Child Loop BB3_3 Depth 2
-; CHECK-P9-BE-NEXT:    lwz r29, 0(r3)
 ; CHECK-P9-BE-NEXT:    li r4, 0
-; CHECK-P9-BE-NEXT:    .p2align 5
-; CHECK-P9-BE-NEXT:  .LBB3_3: # %while.body5.i
-; CHECK-P9-BE-NEXT:    # Parent Loop BB3_2 Depth=1
-; CHECK-P9-BE-NEXT:    # => This Inner Loop Header: Depth=2
+; CHECK-P9-BE-NEXT:    .p2align 4
+; CHECK-P9-BE-NEXT:  .LBB3_2: # %while.body5.i
+; CHECK-P9-BE-NEXT:    #
 ; CHECK-P9-BE-NEXT:    addi r4, r4, -1
 ; CHECK-P9-BE-NEXT:    cmpwi r4, 0
-; CHECK-P9-BE-NEXT:    bgt cr0, .LBB3_3
-; CHECK-P9-BE-NEXT:  # %bb.4: # %while.cond12.preheader.i
+; CHECK-P9-BE-NEXT:    bgt cr0, .LBB3_2
+; CHECK-P9-BE-NEXT:  # %bb.3: # %while.cond12.preheader.i
 ; CHECK-P9-BE-NEXT:    #
+; CHECK-P9-BE-NEXT:    lwz r29, 0(r3)
 ; CHECK-P9-BE-NEXT:    bc 12, 4*cr2+lt, .LBB3_1
-; CHECK-P9-BE-NEXT:  # %bb.5: # %for.cond99.preheader
+; CHECK-P9-BE-NEXT:  # %bb.4: # %for.cond99.preheader
 ; CHECK-P9-BE-NEXT:    #
 ; CHECK-P9-BE-NEXT:    ld r4, 0(r3)
 ; CHECK-P9-BE-NEXT:    extswsli r5, r29, 2
diff --git a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
index 08dcf1d7a0091..8e932e0c00d4f 100644
--- a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
+++ b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
@@ -7,11 +7,11 @@ define void @f(ptr addrspace(7) %arg) {
 ; CHECK-LABEL: define void @f
 ; CHECK-SAME: (ptr addrspace(7) [[ARG:%.*]]) {
 ; CHECK-NEXT:  bb:
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
 ; CHECK-NEXT:    br label [[BB1:%.*]]
 ; CHECK:       bb1:
 ; CHECK-NEXT:    br i1 false, label [[BB2:%.*]], label [[BB1]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
 ; CHECK-NEXT:    br label [[BB3:%.*]]
 ; CHECK:       bb3:
 ; CHECK-NEXT:    [[I4:%.*]] = load i32, ptr addrspace(7) [[SCEVGEP]], align 4
diff --git a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
index 2003b1a72206d..3c6535da486aa 100644
--- a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
+++ b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
@@ -4,33 +4,31 @@
 
 define i32 @remove_loop(i32 %size) #0 {
 ; CHECK-V8M-LABEL: @remove_loop(
-; CHECK-V8M-SAME: i32 [[SIZE:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-V8M-NEXT:  entry:
-; CHECK-V8M-NEXT:    br label %[[WHILE_COND:.*]]
-; CHECK-V8M:       while.cond:
-; CHECK-V8M-NEXT:    br i1 false, label %[[WHILE_COND]], label %[[WHILE_END:.*...
[truncated]

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test from the regression case?

Comment on lines 1503 to 1505
// Skip terminator.
if (Preheader->getTerminator() == &I)
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There probably should be a helper to get a range that trims off the terminator. Shouldn't compare against isTerminator, can check I.isTerminator too

const_cast<MemoryAccess *>(&Accesses->back());
if (isa<MemoryDef>(Back) || isa<MemoryPhi>(Back))
ExitDef = Back;
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces and/or temporary variables to avoid this ugly wrapping

@VigneshwarJ
Copy link
Contributor Author

Add test from the regression case?

I have added a test for scalar revolution update. But for the stackoverflow issue during compilation, it only happens for a big test case where I limit the stack size with ulimit -s command. Not sure how to add that as a regression test.

@VigneshwarJ
Copy link
Contributor Author

Added another test to check that caching the defining access is correct. LICM is conservative and does not sink a load if there is any store after it in the preheader (even if they don't alias). Hence all sunk MemUse share the same defining access

@VigneshwarJ VigneshwarJ requested a review from arsenm December 4, 2025 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants