[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981

david-arm · 2024-09-17T14:18:05Z

Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR #88385.

llvmbot · 2024-09-17T14:18:42Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: David Sherwood (david-arm)

Changes

Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR #88385.

Full diff: https://github.com/llvm/llvm-project/pull/108981.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+31-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (-10)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f726b171969a30..49c10867abef1f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1388,32 +1388,42 @@ class LoopVectorizationCostModel {
 
   /// Returns true if we're required to use a scalar epilogue for at least
   /// the final iteration of the original loop.
-  bool requiresScalarEpilogue(bool IsVectorizing) const {
-    if (!isScalarEpilogueAllowed()) {
+  bool requiresScalarEpilogue(bool IsVectorizing) {
+    std::optional<bool> &CachedResult = RequiresScalarEpilogue[IsVectorizing];
+    if (CachedResult)
+      return *CachedResult;
+
+    auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool {
+      if (!isScalarEpilogueAllowed()) {
+        LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
+        return false;
+      }
+      // If we might exit from anywhere but the latch, must run the exiting
+      // iteration in scalar form.
+      if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+        LLVM_DEBUG(
+            dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
+        return true;
+      }
+      if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
+        LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
+                             "interleaved group requires scalar epilogue\n");
+        return true;
+      }
       LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
       return false;
-    }
-    // If we might exit from anywhere but the latch, must run the exiting
-    // iteration in scalar form.
-    if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
-      LLVM_DEBUG(
-          dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
-      return true;
-    }
-    if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
-      LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
-                           "interleaved group requires scalar epilogue\n");
-      return true;
-    }
-    LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
-    return false;
+    };
+
+    bool Res = NeedsScalarEpilogue(IsVectorizing);
+    CachedResult = Res;
+    return Res;
   }
 
   /// Returns true if we're required to use a scalar epilogue for at least
   /// the final iteration of the original loop for all VFs in \p Range.
   /// A scalar epilogue must either be required for all VFs in \p Range or for
   /// none.
-  bool requiresScalarEpilogue(VFRange Range) const {
+  bool requiresScalarEpilogue(VFRange Range) {
     auto RequiresScalarEpilogue = [this](ElementCount VF) {
       return requiresScalarEpilogue(VF.isVector());
     };
@@ -1782,6 +1792,9 @@ class LoopVectorizationCostModel {
 
   /// All element types found in the loop.
   SmallPtrSet<Type *, 16> ElementTypesInLoop;
+
+  /// Keeps track of whether we require a scalar epilogue.
+  std::optional<bool> RequiresScalarEpilogue[2];
 };
 } // end namespace llvm
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index 38af580e25c9cc..eb805999bebb0f 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -45,7 +45,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
@@ -126,7 +125,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
 ; CHECK-NEXT:  LV: The target has 31 registers of RISCV::GPRRC register class
 ; CHECK-NEXT:  LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Loop cost is 32
 ; CHECK-NEXT:  LV: IC is 1
 ; CHECK-NEXT:  LV: VF is vscale x 4
@@ -178,10 +176,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  scalar.ph:
 ; CHECK-NEXT:  No successors
 ; CHECK-NEXT:  }
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ; CHECK-EMPTY:
 ;
@@ -247,7 +242,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
@@ -328,7 +322,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
 ; CHECK-NEXT:  LV: The target has 31 registers of RISCV::GPRRC register class
 ; CHECK-NEXT:  LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Loop cost is 34
 ; CHECK-NEXT:  LV: IC is 1
 ; CHECK-NEXT:  LV: VF is vscale x 4
@@ -380,10 +373,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  scalar.ph:
 ; CHECK-NEXT:  No successors
 ; CHECK-NEXT:  }
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ;
 entry:

huntergr-arm · 2024-09-17T14:55:45Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    if (CachedResult)
+      return *CachedResult;
+
+    auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool {


I don't think the closure is necessary here. You could initialize Res to false here, then if isScalarEpilogueAllowed() evaluate the other conditions and set Res to true instead of returning true.

fhahn · 2024-09-17T15:10:48Z

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

david-arm · 2024-09-17T15:22:48Z

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

I'm happy to do it this way, but I couldn't convince myself of the earliest place to put such code that is guaranteed not to crash. I can go off and have a look of course, but if you happen to have any ideas that would be great! It would need to be done for both IsVectorizing false and true, since we can't guess in advance which we'd need.

david-arm · 2024-09-18T14:06:57Z

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

fhahn

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

Thanks for checking, does this mean we are missing a new test to cover this case?

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

github-actions · 2024-09-26T15:41:58Z

✅ With the latest revision this PR passed the C/C++ code formatter.

david-arm · 2024-09-26T15:44:08Z

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

Thanks for checking, does this mean we are missing a new test to cover this case?

So for the interleave info case I couldn't find any possible scenario where requireScalarEpilogue would change value after invalidating the interleave groups. For example if you have an interleave group with gaps in a block that needs predication and you happen to support masked interleaved accesses, then we'll require a scalar epilogue. But the place that invalidates the interleave info later on requires that you don't support masked interleaved accesses.

However, I did find an issue with my original patch when changing the scalar epilogue lowering status. I realised we have a missing test case where we request predication on a loop that requires a scalar epilogue (due to an early exit). We should correctly refuse to tail-fold and fall back on normal vectorisation and jump to the scalar epilogue for the last iteration. This test was broken with my previous patch, which I've now fixed.

david-arm · 2024-10-16T10:28:21Z

Rebase + fix code formatting

…logue There is a flag attached to the loop that requests tail-folding, but this cannot be honoured because the early exit requires a scalar epilogue. So we should fall back on normal vectorisation with a scalar epilogue.

Caching the decision returned by requiresScalarEpilogue means that we can avoid printing out the same debug many times, and also avoids repeating the same calculation. This function will get more complex when we start to reason about more early exit loops, such as in PR llvm#88385. The only problem with this is we sometimes have to invalidate the previous result due to changes in the scalar epilogue status or interleave groups.

david-arm requested review from fhahn, huntergr-arm and preames September 17, 2024 14:18

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Sep 17, 2024

huntergr-arm reviewed Sep 17, 2024

View reviewed changes

fhahn reviewed Sep 24, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

david-arm force-pushed the req_scalar_epi branch from cf3af70 to bf04a43 Compare September 26, 2024 15:38

david-arm force-pushed the req_scalar_epi branch from bf04a43 to 9050b71 Compare October 16, 2024 10:27

david-arm added 2 commits November 13, 2024 17:19

david-arm force-pushed the req_scalar_epi branch from 9050b71 to 0bb1b0a Compare November 13, 2024 17:23

david-arm closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981

[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981

Uh oh!

david-arm commented Sep 17, 2024

Uh oh!

llvmbot commented Sep 17, 2024 •

edited

Loading

Uh oh!

huntergr-arm Sep 17, 2024

Uh oh!

fhahn commented Sep 17, 2024

Uh oh!

david-arm commented Sep 17, 2024 •

edited

Loading

Uh oh!

david-arm commented Sep 18, 2024

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2024 •

edited

Loading

Uh oh!

david-arm commented Sep 26, 2024

Uh oh!

david-arm commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981

[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981

Uh oh!

Conversation

david-arm commented Sep 17, 2024

Uh oh!

llvmbot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huntergr-arm Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

fhahn commented Sep 17, 2024

Uh oh!

david-arm commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm commented Sep 18, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm commented Sep 26, 2024

Uh oh!

david-arm commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Sep 17, 2024 •

edited

Loading

david-arm commented Sep 17, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading