Skip to content

Conversation

@mark-sed
Copy link
Contributor

This MR is a alternative approach to #146540 (@fhahn @annamthomas ).

This patch adds loop rotation to runtime loop unrolling, if this makes the loop computable, which then might enable additional unrolling of the loop. To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.

As was the case in the #146540 patch I am thinking about refactoring the loop rotation function to split legality and profitability checks and will try to submit a NFC for it.

computable, which then might enable additional unrolling of the loop.

To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Marek Sedláček (mark-sed)

Changes

This MR is a alternative approach to #146540 (@fhahn @annamthomas ).

This patch adds loop rotation to runtime loop unrolling, if this makes the loop computable, which then might enable additional unrolling of the loop. To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.

As was the case in the #146540 patch I am thinking about refactoring the loop rotation function to split legality and profitability checks and will try to submit a NFC for it.


Patch is 56.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148243.diff

10 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h (+9-6)
  • (modified) llvm/include/llvm/Transforms/Utils/UnrollLoop.h (+17-2)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+3-2)
  • (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+16-12)
  • (modified) llvm/lib/Transforms/Utils/LoopUnroll.cpp (+72-58)
  • (modified) llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp (+8-6)
  • (modified) llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp (+35-13)
  • (modified) llvm/test/Transforms/LoopUnroll/full-unroll-avoid-partial.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll (+189-83)
  • (added) llvm/test/Transforms/LoopUnroll/runtime-unroll-after-rotate.ll (+106)
diff --git a/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h b/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
index c3643e0f27f94..1c83bd706ee3e 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_LOOPROTATIONUTILS_H
 #define LLVM_TRANSFORMS_UTILS_LOOPROTATIONUTILS_H
 
+#include "llvm/ADT/STLExtras.h"
 #include "llvm/Support/Compiler.h"
 
 namespace llvm {
@@ -32,12 +33,14 @@ class TargetTransformInfo;
 /// header. If the loop header's size exceeds the threshold, the loop rotation
 /// will give up. The flag IsUtilMode controls the heuristic used in the
 /// LoopRotation. If it is true, the profitability heuristic will be ignored.
-LLVM_ABI bool LoopRotation(Loop *L, LoopInfo *LI,
-                           const TargetTransformInfo *TTI, AssumptionCache *AC,
-                           DominatorTree *DT, ScalarEvolution *SE,
-                           MemorySSAUpdater *MSSAU, const SimplifyQuery &SQ,
-                           bool RotationOnly, unsigned Threshold,
-                           bool IsUtilMode, bool PrepareForLTO = false);
+/// The ProfitabilityCheck function can override general profitability check.
+LLVM_ABI bool LoopRotation(
+    Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI, AssumptionCache *AC,
+    DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+    const SimplifyQuery &SQ, bool RotationOnly, unsigned Threshold,
+    bool IsUtilMode, bool PrepareForLTO = false,
+    function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck =
+        [](Loop *, ScalarEvolution *) { return false; });
 
 } // namespace llvm
 
diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
index 765c613b04a44..4499b6f43ba19 100644
--- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
@@ -54,11 +54,14 @@ LLVM_ABI const Loop *addClonedBlockToLoopInfo(BasicBlock *OriginalBB,
                                               LoopInfo *LI,
                                               NewLoopsMap &NewLoops);
 
-/// Represents the result of a \c UnrollLoop invocation.
+/// Represents the result of a \c UnrollLoop and \c UnrollAndJamLoop invocation.
 enum class LoopUnrollResult {
   /// The loop was not modified.
   Unmodified,
 
+  /// The loop was modified, but not unrolled.
+  Modified,
+
   /// The loop was partially unrolled -- we still have a loop, but with a
   /// smaller trip count.  We may also have emitted epilogue loop if the loop
   /// had a non-constant trip count.
@@ -69,6 +72,18 @@ enum class LoopUnrollResult {
   FullyUnrolled
 };
 
+/// Represents the result of a \c UnrollRuntimeLoopRemainder invocation.
+enum class LoopReminderUnrollResult {
+  /// The loop reminder was not modified.
+  Unmodified,
+
+  /// The loop was rotated, but not unrolled.
+  Rotated,
+
+  /// The loop reminder was unrolled.
+  Unrolled
+};
+
 struct UnrollLoopOptions {
   unsigned Count;
   bool Force;
@@ -90,7 +105,7 @@ LLVM_ABI LoopUnrollResult UnrollLoop(Loop *L, UnrollLoopOptions ULO,
                                      Loop **RemainderLoop = nullptr,
                                      AAResults *AA = nullptr);
 
-LLVM_ABI bool UnrollRuntimeLoopRemainder(
+LLVM_ABI LoopReminderUnrollResult UnrollRuntimeLoopRemainder(
     Loop *L, unsigned Count, bool AllowExpensiveTripCount,
     bool UseEpilogRemainder, bool UnrollRemainder, bool ForgetAllSCEV,
     LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index a22d84dcf014d..8b1ab5a9e2181 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1357,8 +1357,9 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
   ULO.RuntimeUnrollMultiExit = UP.RuntimeUnrollMultiExit;
   LoopUnrollResult UnrollResult = UnrollLoop(
       L, ULO, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop, AA);
-  if (UnrollResult == LoopUnrollResult::Unmodified)
-    return LoopUnrollResult::Unmodified;
+  if (UnrollResult == LoopUnrollResult::Unmodified ||
+      UnrollResult == LoopUnrollResult::Modified)
+    return UnrollResult;
 
   if (RemainderLoop) {
     std::optional<MDNode *> RemainderLoopID =
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 66d0573e83f65..3d93d8a1b7d4c 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -69,16 +69,19 @@ class LoopRotate {
   bool RotationOnly;
   bool IsUtilMode;
   bool PrepareForLTO;
+  function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck;
 
 public:
   LoopRotate(unsigned MaxHeaderSize, LoopInfo *LI,
              const TargetTransformInfo *TTI, AssumptionCache *AC,
              DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
              const SimplifyQuery &SQ, bool RotationOnly, bool IsUtilMode,
-             bool PrepareForLTO)
+             bool PrepareForLTO,
+             function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck)
       : MaxHeaderSize(MaxHeaderSize), LI(LI), TTI(TTI), AC(AC), DT(DT), SE(SE),
         MSSAU(MSSAU), SQ(SQ), RotationOnly(RotationOnly),
-        IsUtilMode(IsUtilMode), PrepareForLTO(PrepareForLTO) {}
+        IsUtilMode(IsUtilMode), PrepareForLTO(PrepareForLTO),
+        ProfitabilityCheck(ProfitabilityCheck) {}
   bool processLoop(Loop *L);
 
 private:
@@ -440,9 +443,9 @@ bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
 
     // Rotate if either the loop latch does *not* exit the loop, or if the loop
     // latch was just simplified. Or if we think it will be profitable.
-    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
-        !profitableToRotateLoopExitingLatch(L) &&
-        !canRotateDeoptimizingLatchExit(L))
+    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch &&
+        IsUtilMode == false && !profitableToRotateLoopExitingLatch(L) &&
+        !canRotateDeoptimizingLatchExit(L) && !ProfitabilityCheck(L, SE))
       return Rotated;
 
     // Check size of original header and reject loop if it is very big or we can't
@@ -1053,13 +1056,14 @@ bool LoopRotate::processLoop(Loop *L) {
 
 
 /// The utility to convert a loop into a loop with bottom test.
-bool llvm::LoopRotation(Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI,
-                        AssumptionCache *AC, DominatorTree *DT,
-                        ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
-                        const SimplifyQuery &SQ, bool RotationOnly = true,
-                        unsigned Threshold = unsigned(-1),
-                        bool IsUtilMode = true, bool PrepareForLTO) {
+bool llvm::LoopRotation(
+    Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI, AssumptionCache *AC,
+    DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+    const SimplifyQuery &SQ, bool RotationOnly = true,
+    unsigned Threshold = unsigned(-1), bool IsUtilMode = true,
+    bool PrepareForLTO,
+    function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck) {
   LoopRotate LR(Threshold, LI, TTI, AC, DT, SE, MSSAU, SQ, RotationOnly,
-                IsUtilMode, PrepareForLTO);
+                IsUtilMode, PrepareForLTO, ProfitabilityCheck);
   return LR.processLoop(L);
 }
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 86b268de43cf6..74662092a014f 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -486,12 +486,7 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
 
   // All these values should be taken only after peeling because they might have
   // changed.
-  BasicBlock *Preheader = L->getLoopPreheader();
-  BasicBlock *Header = L->getHeader();
   BasicBlock *LatchBlock = L->getLoopLatch();
-  SmallVector<BasicBlock *, 4> ExitBlocks;
-  L->getExitBlocks(ExitBlocks);
-  std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();
 
   const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
   const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
@@ -504,42 +499,6 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   if (MaxTripCount && ULO.Count > MaxTripCount)
     ULO.Count = MaxTripCount;
 
-  struct ExitInfo {
-    unsigned TripCount;
-    unsigned TripMultiple;
-    unsigned BreakoutTrip;
-    bool ExitOnTrue;
-    BasicBlock *FirstExitingBlock = nullptr;
-    SmallVector<BasicBlock *> ExitingBlocks;
-  };
-  DenseMap<BasicBlock *, ExitInfo> ExitInfos;
-  SmallVector<BasicBlock *, 4> ExitingBlocks;
-  L->getExitingBlocks(ExitingBlocks);
-  for (auto *ExitingBlock : ExitingBlocks) {
-    // The folding code is not prepared to deal with non-branch instructions
-    // right now.
-    auto *BI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
-    if (!BI)
-      continue;
-
-    ExitInfo &Info = ExitInfos[ExitingBlock];
-    Info.TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
-    Info.TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
-    if (Info.TripCount != 0) {
-      Info.BreakoutTrip = Info.TripCount % ULO.Count;
-      Info.TripMultiple = 0;
-    } else {
-      Info.BreakoutTrip = Info.TripMultiple =
-          (unsigned)std::gcd(ULO.Count, Info.TripMultiple);
-    }
-    Info.ExitOnTrue = !L->contains(BI->getSuccessor(0));
-    Info.ExitingBlocks.push_back(ExitingBlock);
-    LLVM_DEBUG(dbgs() << "  Exiting block %" << ExitingBlock->getName()
-                      << ": TripCount=" << Info.TripCount
-                      << ", TripMultiple=" << Info.TripMultiple
-                      << ", BreakoutTrip=" << Info.BreakoutTrip << "\n");
-  }
-
   // Are we eliminating the loop control altogether?  Note that we can know
   // we're eliminating the backedge without knowing exactly which iteration
   // of the unrolled body exits.
@@ -552,17 +511,6 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   if (CompletelyUnroll)
     ULO.Runtime = false;
 
-  // Go through all exits of L and see if there are any phi-nodes there. We just
-  // conservatively assume that they're inserted to preserve LCSSA form, which
-  // means that complete unrolling might break this form. We need to either fix
-  // it in-place after the transformation, or entirely rebuild LCSSA. TODO: For
-  // now we just recompute LCSSA for the outer loop, but it should be possible
-  // to fix it in-place.
-  bool NeedToFixLCSSA =
-      PreserveLCSSA && CompletelyUnroll &&
-      any_of(ExitBlocks,
-             [](const BasicBlock *BB) { return isa<PHINode>(BB->begin()); });
-
   // The current loop unroll pass can unroll loops that have
   // (1) single latch; and
   // (2a) latch is unconditional; or
@@ -587,21 +535,87 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
       UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog
                                               : isEpilogProfitable(L);
 
+  LoopReminderUnrollResult UnrollReminderResult =
+      LoopReminderUnrollResult::Unmodified;
+  if (ULO.Runtime) {
+    UnrollReminderResult = UnrollRuntimeLoopRemainder(
+        L, ULO.Count, ULO.AllowExpensiveTripCount, EpilogProfitability,
+        ULO.UnrollRemainder, ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
+        PreserveLCSSA, ULO.SCEVExpansionBudget, ULO.RuntimeUnrollMultiExit,
+        RemainderLoop);
+    LatchBlock = L->getLoopLatch();
+    LatchIsExiting = L->isLoopExiting(LatchBlock);
+  }
+
   if (ULO.Runtime &&
-      !UnrollRuntimeLoopRemainder(L, ULO.Count, ULO.AllowExpensiveTripCount,
-                                  EpilogProfitability, ULO.UnrollRemainder,
-                                  ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
-                                  PreserveLCSSA, ULO.SCEVExpansionBudget,
-                                  ULO.RuntimeUnrollMultiExit, RemainderLoop)) {
+      UnrollReminderResult != LoopReminderUnrollResult::Unrolled) {
     if (ULO.Force)
       ULO.Runtime = false;
     else {
       LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "
                            "generated when assuming runtime trip count\n");
-      return LoopUnrollResult::Unmodified;
+      // Loop might have been rotated inside of UnrollRuntimeLoopRemainder and
+      // this needs to be propagated.
+      return UnrollReminderResult == LoopReminderUnrollResult::Rotated
+                 ? LoopUnrollResult::Modified
+                 : LoopUnrollResult::Unmodified;
+      ;
     }
   }
 
+  BasicBlock *Preheader = L->getLoopPreheader();
+  BasicBlock *Header = L->getHeader();
+  SmallVector<BasicBlock *, 4> ExitBlocks;
+  L->getExitBlocks(ExitBlocks);
+  std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();
+
+  // Go through all exits of L and see if there are any phi-nodes there. We just
+  // conservatively assume that they're inserted to preserve LCSSA form, which
+  // means that complete unrolling might break this form. We need to either fix
+  // it in-place after the transformation, or entirely rebuild LCSSA. TODO: For
+  // now we just recompute LCSSA for the outer loop, but it should be possible
+  // to fix it in-place.
+  bool NeedToFixLCSSA =
+      PreserveLCSSA && CompletelyUnroll &&
+      any_of(ExitBlocks,
+             [](const BasicBlock *BB) { return isa<PHINode>(BB->begin()); });
+
+  struct ExitInfo {
+    unsigned TripCount;
+    unsigned TripMultiple;
+    unsigned BreakoutTrip;
+    bool ExitOnTrue;
+    BasicBlock *FirstExitingBlock = nullptr;
+    SmallVector<BasicBlock *> ExitingBlocks;
+  };
+  DenseMap<BasicBlock *, ExitInfo> ExitInfos;
+  SmallVector<BasicBlock *, 4> ExitingBlocks;
+  L->getExitingBlocks(ExitingBlocks);
+  for (auto *ExitingBlock : ExitingBlocks) {
+    // The folding code is not prepared to deal with non-branch instructions
+    // right now.
+    auto *BI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
+    if (!BI)
+      continue;
+
+    ExitInfo &Info = ExitInfos[ExitingBlock];
+    Info.TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
+    Info.TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
+    if (Info.TripCount != 0) {
+      Info.BreakoutTrip = Info.TripCount % ULO.Count;
+      Info.TripMultiple = 0;
+    } else {
+      Info.BreakoutTrip = Info.TripMultiple =
+          (unsigned)std::gcd(ULO.Count, Info.TripMultiple);
+    }
+    Info.ExitOnTrue = !L->contains(BI->getSuccessor(0));
+    Info.ExitingBlocks.push_back(ExitingBlock);
+    LLVM_DEBUG(dbgs() << "  Exiting block %" << ExitingBlock->getName()
+                      << ": TripCount=" << Info.TripCount
+                      << ", TripMultiple=" << Info.TripMultiple
+                      << ", BreakoutTrip=" << Info.BreakoutTrip << "\n");
+  }
+
   using namespace ore;
   // Report the unrolling decision.
   if (CompletelyUnroll) {
diff --git a/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp b/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
index ca90bb65f5708..43189ddea4603 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
@@ -239,14 +239,16 @@ llvm::UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,
 
   // We use the runtime remainder in cases where we don't know trip multiple
   if (TripMultiple % Count != 0) {
-    if (!UnrollRuntimeLoopRemainder(L, Count, /*AllowExpensiveTripCount*/ false,
-                                    /*UseEpilogRemainder*/ true,
-                                    UnrollRemainder, /*ForgetAllSCEV*/ false,
-                                    LI, SE, DT, AC, TTI, true,
-                                    SCEVCheapExpansionBudget, EpilogueLoop)) {
+    auto UnrollReminderResult = UnrollRuntimeLoopRemainder(
+        L, Count, /*AllowExpensiveTripCount*/ false,
+        /*UseEpilogRemainder*/ true, UnrollRemainder, /*ForgetAllSCEV*/ false,
+        LI, SE, DT, AC, TTI, true, SCEVCheapExpansionBudget, EpilogueLoop);
+    if (UnrollReminderResult != LoopReminderUnrollResult::Unrolled) {
       LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; remainder loop could not be "
                            "generated when assuming runtime trip count\n");
-      return LoopUnrollResult::Unmodified;
+      return UnrollReminderResult == LoopReminderUnrollResult::Rotated
+                 ? LoopUnrollResult::Modified
+                 : LoopUnrollResult::Unmodified;
     }
   }
 
diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index bf882d7406853..142f97b5c3a1a 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -37,6 +37,7 @@
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/LoopRotationUtils.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
 #include "llvm/Transforms/Utils/UnrollLoop.h"
@@ -574,7 +575,7 @@ static Value *CreateTripRemainder(IRBuilder<> &B, Value *BECount,
 ///        if (extraiters != 0) jump Epil: // Omitted if unroll factor is 2.
 /// EpilExit:
 
-bool llvm::UnrollRuntimeLoopRemainder(
+LoopReminderUnrollResult llvm::UnrollRuntimeLoopRemainder(
     Loop *L, unsigned Count, bool AllowExpensiveTripCount,
     bool UseEpilogRemainder, bool UnrollRemainder, bool ForgetAllSCEV,
     LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
@@ -586,10 +587,31 @@ bool llvm::UnrollRuntimeLoopRemainder(
   LLVM_DEBUG(UseEpilogRemainder ? dbgs() << "Using epilog remainder.\n"
                                 : dbgs() << "Using prolog remainder.\n");
 
+  LoopReminderUnrollResult Result = LoopReminderUnrollResult::Unmodified;
+
+  // Rotate loop if it makes the exit count from the latch computable.
+  BasicBlock *OrigHeader = L->getHeader();
+  BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+  if (BI && !BI->isUnconditional() &&
+      isa<SCEVCouldNotCompute>(SE->getExitCount(L, L->getLoopLatch())) &&
+      !isa<SCEVCouldNotCompute>(SE->getExitCount(L, OrigHeader))) {
+    LLVM_DEBUG(
+        dbgs() << "  Rotating loop to make the exit count computable.\n");
+    SimplifyQuery SQ{OrigHeader->getDataLayout()};
+    SQ.TLI = nullptr;
+    SQ.DT = DT;
+    SQ.AC = AC;
+    if (llvm::LoopRotation(L, LI, TTI, AC, DT, SE, nullptr /*MemorySSAUpdater*/,
+                           SQ, false /*RotationOnly*/, 16 /*Threshold*/,
+                           false /*IsUtilMode*/, false /*PrepareForLTO*/,
+                           [](Loop *, ScalarEvolution *) { return true; }))
+      Result = LoopReminderUnrollResult::Rotated;
+  }
+
   // Make sure the loop is in canonical form.
   if (!L->isLoopSimplifyForm()) {
     LLVM_DEBUG(dbgs() << "Not in simplify form!\n");
-    return false;
+    return Result;
   }
 
   // Guaranteed by LoopSimplifyForm.
@@ -603,7 +625,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
     LLVM_DEBUG(
         dbgs()
         << "Loop latch not terminated by a conditional branch.\n");
-    return false;
+    return Result;
   }
 
   unsigned ExitIndex = LatchBR->getSuccessor(0) == Header ? 1 : 0;
@@ -615,7 +637,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
     LLVM_DEBUG(
         dbgs()
         << "One of the loop latch successors must be the exit block.\n");
-    return false;
+    return Result;
   }
 
   // These are exit blocks other than the target of the latch exiting block.
@@ -627,12 +649,12 @@ bool llvm::UnrollRuntimeLoopRemainder(
     // We rely on LCSSA form being preserved when the exit blocks are transformed.
     // (Note that only an off-by-default mode of the old PM disables PreserveLCCA.)
     if (!PreserveLCSSA)
-      return false;
+      return Result;
 
     // Priority goes to UnrollRuntimeMultiExit if it's supplied.
     if (UnrollRuntimeMultiExit.getNumOccurrences()) {
       if (!UnrollRuntimeMultiExit)
-        return false;
+        return Result;
     } else {
       // Otherwis...
[truncated]

@annamthomas
Copy link
Contributor

Some comments inline.

Thinking more about this, how about a slightly different approach (now that we can correct perform rotation in the UnrollRuntimeLoopRemainder):

  • pass the fact that we need rotation from the callee (UnrollRuntimeLoopRemainder) to the caller.
  • Depending on the caller, let the caller decide if it wants to do rotation and then call UnrollRuntimeLoopRemainder once more after doing the rotation.

This has 2 benefits:

  • Reduces the chances of "rotation without unrolling" even further. We passed all the legality and performance considerations for runtime unrolling, except for this rotation need. You can do what @fhahn previously suggested: assume we rotated and do the remaining checks on the header's exit count, i.e. as if the loop was rotated.
  • The caller can decide what to do with this (for example, we don't want unroll-and-jam to do anything with this). Similarly, if there are downstream users of unrolling, they wont get affected by the "unexpected" rotation.

@annamthomas annamthomas requested review from annamthomas and fhahn July 14, 2025 16:48
@mark-sed
Copy link
Contributor Author

@annamthomas Good suggestions. I have now added an argument to UnrollRuntimeLoopRemainder, which allows to rotate the loop and disabled it for UnrollAndJam. I'll now add the other part, which is a detection if unroll worked without rotation and if not, I'll also rotate the loop.

@mark-sed
Copy link
Contributor Author

@annamthomas When rewriting the code I have realized that it makes more sense to move the rotation after the failure of checks in UnrollRuntimeLoopRemainder and then do the rotation in LoopUnroll and call UnrollRuntimeLoopRemainder again.

This simplifies UnrollRuntimeLoopRemainder (no longer the need to propagate rotation into LoopUnroll) and also this should lower the chances of rotation without unrolling as @fhahn mentioned in the other PR.

Comment on lines 551 to 567
BasicBlock *OrigHeader = L->getHeader();
BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
if (BI && !BI->isUnconditional() &&
isa<SCEVCouldNotCompute>(SE->getExitCount(L, L->getLoopLatch())) &&
!isa<SCEVCouldNotCompute>(SE->getExitCount(L, OrigHeader))) {
LLVM_DEBUG(
dbgs() << " Rotating loop to make the exit count computable.\n");
SimplifyQuery SQ{OrigHeader->getDataLayout()};
SQ.TLI = nullptr;
SQ.DT = DT;
SQ.AC = AC;
LoopRotated =
llvm::LoopRotation(L, LI, TTI, AC, DT, SE,
/*MemorySSAUpdater*/ nullptr, SQ,
/*RotationOnly*/ false, /*Threshold*/ 16,
/*IsUtilMode*/ false, /*PrepareForLTO*/ false,
[](Loop *, ScalarEvolution *) { return true; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be done here or could UnrollRuntimeLoopRemainder do it if needed? This would make it easier to keep checks in sync and could avoid calling UnrollRuntimeLoopRemainder twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnrollRuntimeLoopReminder could do it, but it will still need to do all the checks it does after it does the rotation, this means UnrollRuntimeLoopReminder would have to take a flag if rotation is allowed (so that UnrollAndJam does not trigger rotation) and then it would do all the unroll checks, if they failed it will rotate and then do the checks again (then unrolling).
So doing it inside of reminder unroll, will copy the checks, which is similar to current stare where the call is copied.
But I see your point and it will make the reminder unroll more powerful and could be used later on in other parts, so I can change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there that many checks? I'm not sure we need to re-run the checks after rotation.

Can't we just do the checks once, using either the latch or header, if latch isn't exiting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(LoopUnroll itself works on both rotated and unrotated loops in a similar way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I have tested it, I think having it this way is cleaner and nicer. When I tried to move the rotation inside of the UnrollRuntimeLoopRemainder I tried separating the legality checks into a function/lambda to call, but it relies no a lot of resources (Latch, Header, LatchBR, LatchExit, TripCountSC...) which are also used later on in the unrolling.
So it currently initializes some values, does a check, intializes other, does a check and so forth and then it unrolls. If I was to add rotation to this, then even with lambda that captures all, I will have to initialize all of them and them do the checks inside of the lambda, rotate and then re-initialize all of the values and run the check again. It becomes a mess and less efficient than just calling UnrollRuntimeLoopRemainder twice, since it currently can do early exit without initializing all the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking @mark-sed! If you still have it, could you share the diff of how that would look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not, but I did try to quickly put it together now, it will not compile and some stuff could be optimized, but it would be something like this and I much prefer the current implementation. The reason why it is like this is because the unrolling uses those values that are being checked and many of those need to be recalculated after the rotation.

It would be something like this:

diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index 36c976e23eed..d1a41e6e73a5 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -616,106 +616,157 @@ bool llvm::UnrollRuntimeLoopRemainder(
     return false;
   }
 
+  // Use Scalar Evolution to compute the trip count. This allows more loops to
+  // be unrolled than relying on induction var simplification.
+  if (!SE)
+    return false;
+
   // Guaranteed by LoopSimplifyForm.
   BasicBlock *Latch = L->getLoopLatch();
   BasicBlock *Header = L->getHeader();
 
   BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());
-
-  if (!LatchBR || LatchBR->isUnconditional()) {
-    // The loop-rotate pass can be helpful to avoid this in many cases.
-    LLVM_DEBUG(
-        dbgs()
-        << "Loop latch not terminated by a conditional branch.\n");
-    return false;
-  }
-
   unsigned ExitIndex = LatchBR->getSuccessor(0) == Header ? 1 : 0;
   BasicBlock *LatchExit = LatchBR->getSuccessor(ExitIndex);
 
-  if (L->contains(LatchExit)) {
-    // Cloning the loop basic blocks (`CloneLoopBlocks`) requires that one of the
-    // targets of the Latch be an exit block out of the loop.
-    LLVM_DEBUG(
-        dbgs()
-        << "One of the loop latch successors must be the exit block.\n");
-    return false;
-  }
-
   // These are exit blocks other than the target of the latch exiting block.
   SmallVector<BasicBlock *, 4> OtherExits;
   L->getUniqueNonLatchExitBlocks(OtherExits);
-  // Support only single exit and exiting block unless multi-exit loop
-  // unrolling is enabled.
-  if (!L->getExitingBlock() || OtherExits.size()) {
-    // We rely on LCSSA form being preserved when the exit blocks are transformed.
-    // (Note that only an off-by-default mode of the old PM disables PreserveLCCA.)
-    if (!PreserveLCSSA)
+
+  BasicBlock *PreHeader = L->getLoopPreheader();
+  BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
+  const DataLayout &DL = Header->getDataLayout();
+
+  auto canUnrollReminder = [&]() {
+    // Only unroll loops with a computable trip count.
+    // We calculate the backedge count by using getExitCount on the Latch block,
+    // which is proven to be the only exiting block in this loop. This is same as
+    // calculating getBackedgeTakenCount on the loop (which computes SCEV for all
+    // exiting blocks).
+    const SCEV *BECountSC = SE->getExitCount(L, Latch);
+
+    // Add 1 since the backedge count doesn't include the first loop iteration.
+    // (Note that overflow can occur, this is handled explicitly below)
+    SCEV *TripCountSC =
+        SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
+
+    if (!LatchBR || LatchBR->isUnconditional()) {
+      // The loop-rotate pass can be helpful to avoid this in many cases.
+      LLVM_DEBUG(
+          dbgs()
+          << "Loop latch not terminated by a conditional branch.\n");
       return false;
+    }
 
-    // Priority goes to UnrollRuntimeMultiExit if it's supplied.
-    if (UnrollRuntimeMultiExit.getNumOccurrences()) {
-      if (!UnrollRuntimeMultiExit)
-        return false;
-    } else {
-      // Otherwise perform multi-exit unrolling, if either the target indicates
-      // it is profitable or the general profitability heuristics apply.
-      if (!RuntimeUnrollMultiExit &&
-          !canProfitablyRuntimeUnrollMultiExitLoop(L, BPI, TTI, OtherExits,
-                                                   LatchExit,
-                                                   UseEpilogRemainder)) {
-        LLVM_DEBUG(dbgs() << "Multiple exit/exiting blocks in loop and "
-                             "multi-exit unrolling not enabled!\n");
+    if (L->contains(LatchExit)) {
+      // Cloning the loop basic blocks (`CloneLoopBlocks`) requires that one of the
+      // targets of the Latch be an exit block out of the loop.
+      LLVM_DEBUG(
+          dbgs()
+          << "One of the loop latch successors must be the exit block.\n");
+      return false;
+    }
+
+    // Support only single exit and exiting block unless multi-exit loop
+    // unrolling is enabled.
+    if (!L->getExitingBlock() || OtherExits.size()) {
+      // We rely on LCSSA form being preserved when the exit blocks are transformed.
+      // (Note that only an off-by-default mode of the old PM disables PreserveLCCA.)
+      if (!PreserveLCSSA)
         return false;
+
+      // Priority goes to UnrollRuntimeMultiExit if it's supplied.
+      if (UnrollRuntimeMultiExit.getNumOccurrences()) {
+        if (!UnrollRuntimeMultiExit)
+          return false;
+      } else {
+        // Otherwise perform multi-exit unrolling, if either the target indicates
+        // it is profitable or the general profitability heuristics apply.
+        if (!RuntimeUnrollMultiExit &&
+            !canProfitablyRuntimeUnrollMultiExitLoop(L, BPI, TTI, OtherExits,
+                                                    LatchExit,
+                                                    UseEpilogRemainder)) {
+          LLVM_DEBUG(dbgs() << "Multiple exit/exiting blocks in loop and "
+                              "multi-exit unrolling not enabled!\n");
+          return false;
+        }
       }
     }
-  }
-  // Use Scalar Evolution to compute the trip count. This allows more loops to
-  // be unrolled than relying on induction var simplification.
-  if (!SE)
-    return false;
 
-  // Only unroll loops with a computable trip count.
-  // We calculate the backedge count by using getExitCount on the Latch block,
-  // which is proven to be the only exiting block in this loop. This is same as
-  // calculating getBackedgeTakenCount on the loop (which computes SCEV for all
-  // exiting blocks).
-  const SCEV *BECountSC = SE->getExitCount(L, Latch);
-  if (isa<SCEVCouldNotCompute>(BECountSC)) {
-    LLVM_DEBUG(dbgs() << "Could not compute exit block SCEV\n");
-    return false;
+    if (isa<SCEVCouldNotCompute>(BECountSC)) {
+      LLVM_DEBUG(dbgs() << "Could not compute exit block SCEV\n");
+      return false;
+    }
+
+    unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();
+
+    if (isa<SCEVCouldNotCompute>(TripCountSC)) {
+      LLVM_DEBUG(dbgs() << "Could not compute trip count SCEV.\n");
+      return false;
+    }
+
+    SCEVExpander Expander(*SE, DL, "loop-unroll");
+    if (!AllowExpensiveTripCount &&
+        Expander.isHighCostExpansion(TripCountSC, L, SCEVExpansionBudget, TTI,
+                                    PreHeaderBR)) {
+      LLVM_DEBUG(dbgs() << "High cost for expanding trip count scev!\n");
+      return false;
+    }
+
+    // This constraint lets us deal with an overflowing trip count easily; see the
+    // comment on ModVal below.
+    if (Log2_32(Count) > BEWidth) {
+      LLVM_DEBUG(
+          dbgs()
+          << "Count failed constraint on overflow trip count calculation.\n");
+      return false;
+    }
+
+    return true;
+  };
+
+  bool LoopRotated = false;
+  if (!canUnrollReminder()) {
+    BasicBlock *OrigHeader = L->getHeader();
+    BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+    if (BI && !BI->isUnconditional() &&
+        isa<SCEVCouldNotCompute>(SE->getExitCount(L, L->getLoopLatch())) &&
+        !isa<SCEVCouldNotCompute>(SE->getExitCount(L, OrigHeader))) {
+      LLVM_DEBUG(
+          dbgs() << "  Rotating loop to make the exit count computable.\n");
+      SimplifyQuery SQ{OrigHeader->getDataLayout()};
+      SQ.TLI = nullptr;
+      SQ.DT = DT;
+      SQ.AC = AC;
+      LoopRotated =
+          llvm::LoopRotation(L, LI, TTI, AC, DT, SE,
+                              /*MemorySSAUpdater*/ nullptr, SQ,
+                              /*RotationOnly*/ false, /*Threshold*/ 16,
+                              /*IsUtilMode*/ false, /*PrepareForLTO*/ false,
+                              [](Loop *, ScalarEvolution *) { return true; });
+    }
   }
+  if (!LoopRotated)
+    return false; 
 
-  unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();
+  Latch = L->getLoopLatch();
+  Header = L->getHeader();
 
-  // Add 1 since the backedge count doesn't include the first loop iteration.
-  // (Note that overflow can occur, this is handled explicitly below)
-  const SCEV *TripCountSC =
+  LatchBR = cast<BranchInst>(Latch->getTerminator());
+  ExitIndex = LatchBR->getSuccessor(0) == Header ? 1 : 0;
+  LatchExit = LatchBR->getSuccessor(ExitIndex);
+
+  L->getUniqueNonLatchExitBlocks(OtherExits);
+  const SCEV *BECountSC = SE->getExitCount(L, Latch);
+  TripCountSC =
       SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
-  if (isa<SCEVCouldNotCompute>(TripCountSC)) {
-    LLVM_DEBUG(dbgs() << "Could not compute trip count SCEV.\n");
-    return false;
-  }
 
-  BasicBlock *PreHeader = L->getLoopPreheader();
-  BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
-  const DataLayout &DL = Header->getDataLayout();
+  PreHeader = L->getLoopPreheader();
+  PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
   SCEVExpander Expander(*SE, DL, "loop-unroll");
-  if (!AllowExpensiveTripCount &&
-      Expander.isHighCostExpansion(TripCountSC, L, SCEVExpansionBudget, TTI,
-                                   PreHeaderBR)) {
-    LLVM_DEBUG(dbgs() << "High cost for expanding trip count scev!\n");
-    return false;
-  }
 
-  // This constraint lets us deal with an overflowing trip count easily; see the
-  // comment on ModVal below.
-  if (Log2_32(Count) > BEWidth) {
-    LLVM_DEBUG(
-        dbgs()
-        << "Count failed constraint on overflow trip count calculation.\n");
+  if (!canUnrollReminder())
     return false;
-  }
 
   // Loop structure is the following:
   //

@mark-sed
Copy link
Contributor Author

mark-sed commented Oct 9, 2025

I will be closing this PR as this does not seem like the right way to go. We also found downstream that changing the order of the ExitInfo computation to after runtime unrolling is done, has an unintended side effect.
I have proposed a new profitability check in loop rotation from which runtime unrolling can now also benefit - #162654.

@mark-sed mark-sed closed this Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants