Skip to content

Conversation

@david-arm
Copy link
Contributor

Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR #88385.

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: David Sherwood (david-arm)

Changes

Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR #88385.


Full diff: https://github.com/llvm/llvm-project/pull/108981.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+31-18)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (-10)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f726b171969a30..49c10867abef1f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1388,32 +1388,42 @@ class LoopVectorizationCostModel {
 
   /// Returns true if we're required to use a scalar epilogue for at least
   /// the final iteration of the original loop.
-  bool requiresScalarEpilogue(bool IsVectorizing) const {
-    if (!isScalarEpilogueAllowed()) {
+  bool requiresScalarEpilogue(bool IsVectorizing) {
+    std::optional<bool> &CachedResult = RequiresScalarEpilogue[IsVectorizing];
+    if (CachedResult)
+      return *CachedResult;
+
+    auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool {
+      if (!isScalarEpilogueAllowed()) {
+        LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
+        return false;
+      }
+      // If we might exit from anywhere but the latch, must run the exiting
+      // iteration in scalar form.
+      if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+        LLVM_DEBUG(
+            dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
+        return true;
+      }
+      if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
+        LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
+                             "interleaved group requires scalar epilogue\n");
+        return true;
+      }
       LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
       return false;
-    }
-    // If we might exit from anywhere but the latch, must run the exiting
-    // iteration in scalar form.
-    if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
-      LLVM_DEBUG(
-          dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
-      return true;
-    }
-    if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
-      LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
-                           "interleaved group requires scalar epilogue\n");
-      return true;
-    }
-    LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
-    return false;
+    };
+
+    bool Res = NeedsScalarEpilogue(IsVectorizing);
+    CachedResult = Res;
+    return Res;
   }
 
   /// Returns true if we're required to use a scalar epilogue for at least
   /// the final iteration of the original loop for all VFs in \p Range.
   /// A scalar epilogue must either be required for all VFs in \p Range or for
   /// none.
-  bool requiresScalarEpilogue(VFRange Range) const {
+  bool requiresScalarEpilogue(VFRange Range) {
     auto RequiresScalarEpilogue = [this](ElementCount VF) {
       return requiresScalarEpilogue(VF.isVector());
     };
@@ -1782,6 +1792,9 @@ class LoopVectorizationCostModel {
 
   /// All element types found in the loop.
   SmallPtrSet<Type *, 16> ElementTypesInLoop;
+
+  /// Keeps track of whether we require a scalar epilogue.
+  std::optional<bool> RequiresScalarEpilogue[2];
 };
 } // end namespace llvm
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index 38af580e25c9cc..eb805999bebb0f 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -45,7 +45,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
@@ -126,7 +125,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
 ; CHECK-NEXT:  LV: The target has 31 registers of RISCV::GPRRC register class
 ; CHECK-NEXT:  LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Loop cost is 32
 ; CHECK-NEXT:  LV: IC is 1
 ; CHECK-NEXT:  LV: VF is vscale x 4
@@ -178,10 +176,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  scalar.ph:
 ; CHECK-NEXT:  No successors
 ; CHECK-NEXT:  }
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ; CHECK-EMPTY:
 ;
@@ -247,7 +242,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
@@ -328,7 +322,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
 ; CHECK-NEXT:  LV: The target has 31 registers of RISCV::GPRRC register class
 ; CHECK-NEXT:  LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Loop cost is 34
 ; CHECK-NEXT:  LV: IC is 1
 ; CHECK-NEXT:  LV: VF is vscale x 4
@@ -380,10 +373,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  scalar.ph:
 ; CHECK-NEXT:  No successors
 ; CHECK-NEXT:  }
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
-; CHECK-NEXT:  LV: Loop does not require scalar epilogue
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ;
 entry:

if (CachedResult)
return *CachedResult;

auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the closure is necessary here. You could initialize Res to false here, then if isScalarEpilogueAllowed() evaluate the other conditions and set Res to true instead of returning true.

@fhahn
Copy link
Contributor

fhahn commented Sep 17, 2024

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

@david-arm
Copy link
Contributor Author

david-arm commented Sep 17, 2024

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

I'm happy to do it this way, but I couldn't convince myself of the earliest place to put such code that is guaranteed not to crash. I can go off and have a look of course, but if you happen to have any ideas that would be great! It would need to be done for both IsVectorizing false and true, since we can't guess in advance which we'd need.

@david-arm
Copy link
Contributor Author

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

Thanks for checking, does this mean we are missing a new test to cover this case?

@github-actions
Copy link

github-actions bot commented Sep 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@david-arm
Copy link
Contributor Author

Would it be simpler to have a separate helper setRequiresScalarEpilogue that's called once up-front with requiresScalarEpilogue simply returning the decision as done in multiple other places?

OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!

Thanks for checking, does this mean we are missing a new test to cover this case?

So for the interleave info case I couldn't find any possible scenario where requireScalarEpilogue would change value after invalidating the interleave groups. For example if you have an interleave group with gaps in a block that needs predication and you happen to support masked interleaved accesses, then we'll require a scalar epilogue. But the place that invalidates the interleave info later on requires that you don't support masked interleaved accesses.

However, I did find an issue with my original patch when changing the scalar epilogue lowering status. I realised we have a missing test case where we request predication on a loop that requires a scalar epilogue (due to an early exit). We should correctly refuse to tail-fold and fall back on normal vectorisation and jump to the scalar epilogue for the last iteration. This test was broken with my previous patch, which I've now fixed.

@david-arm
Copy link
Contributor Author

Rebase + fix code formatting

…logue

There is a flag attached to the loop that requests tail-folding,
but this cannot be honoured because the early exit requires a
scalar epilogue. So we should fall back on normal vectorisation
with a scalar epilogue.
Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR llvm#88385. The only problem with this is we sometimes have to
invalidate the previous result due to changes in the scalar epilogue
status or interleave groups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants