-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[NFC][LoopVectorize] Cache result of requiresScalarEpilogue #108981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-risc-v Author: David Sherwood (david-arm) ChangesCaching the decision returned by requiresScalarEpilogue means that Full diff: https://github.com/llvm/llvm-project/pull/108981.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f726b171969a30..49c10867abef1f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1388,32 +1388,42 @@ class LoopVectorizationCostModel {
/// Returns true if we're required to use a scalar epilogue for at least
/// the final iteration of the original loop.
- bool requiresScalarEpilogue(bool IsVectorizing) const {
- if (!isScalarEpilogueAllowed()) {
+ bool requiresScalarEpilogue(bool IsVectorizing) {
+ std::optional<bool> &CachedResult = RequiresScalarEpilogue[IsVectorizing];
+ if (CachedResult)
+ return *CachedResult;
+
+ auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool {
+ if (!isScalarEpilogueAllowed()) {
+ LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
+ return false;
+ }
+ // If we might exit from anywhere but the latch, must run the exiting
+ // iteration in scalar form.
+ if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ LLVM_DEBUG(
+ dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
+ return true;
+ }
+ if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
+ LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
+ "interleaved group requires scalar epilogue\n");
+ return true;
+ }
LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
return false;
- }
- // If we might exit from anywhere but the latch, must run the exiting
- // iteration in scalar form.
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
- LLVM_DEBUG(
- dbgs() << "LV: Loop requires scalar epilogue: multiple exits\n");
- return true;
- }
- if (IsVectorizing && InterleaveInfo.requiresScalarEpilogue()) {
- LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: "
- "interleaved group requires scalar epilogue\n");
- return true;
- }
- LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
- return false;
+ };
+
+ bool Res = NeedsScalarEpilogue(IsVectorizing);
+ CachedResult = Res;
+ return Res;
}
/// Returns true if we're required to use a scalar epilogue for at least
/// the final iteration of the original loop for all VFs in \p Range.
/// A scalar epilogue must either be required for all VFs in \p Range or for
/// none.
- bool requiresScalarEpilogue(VFRange Range) const {
+ bool requiresScalarEpilogue(VFRange Range) {
auto RequiresScalarEpilogue = [this](ElementCount VF) {
return requiresScalarEpilogue(VF.isVector());
};
@@ -1782,6 +1792,9 @@ class LoopVectorizationCostModel {
/// All element types found in the loop.
SmallPtrSet<Type *, 16> ElementTypesInLoop;
+
+ /// Keeps track of whether we require a scalar epilogue.
+ std::optional<bool> RequiresScalarEpilogue[2];
};
} // end namespace llvm
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index 38af580e25c9cc..eb805999bebb0f 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -45,7 +45,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
@@ -126,7 +125,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class
; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Loop cost is 32
; CHECK-NEXT: LV: IC is 1
; CHECK-NEXT: LV: VF is vscale x 4
@@ -178,10 +176,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: scalar.ph:
; CHECK-NEXT: No successors
; CHECK-NEXT: }
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Interleaving disabled by the pass manager
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Vectorizing: innermost loop.
; CHECK-EMPTY:
;
@@ -247,7 +242,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %indvars.iv.next = add nsw i64 %indvars.iv, -1
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
@@ -328,7 +322,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class
; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Loop cost is 34
; CHECK-NEXT: LV: IC is 1
; CHECK-NEXT: LV: VF is vscale x 4
@@ -380,10 +373,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: scalar.ph:
; CHECK-NEXT: No successors
; CHECK-NEXT: }
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Interleaving disabled by the pass manager
-; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Vectorizing: innermost loop.
;
entry:
|
| if (CachedResult) | ||
| return *CachedResult; | ||
|
|
||
| auto NeedsScalarEpilogue = [&](bool IsVectorizing) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the closure is necessary here. You could initialize Res to false here, then if isScalarEpilogueAllowed() evaluate the other conditions and set Res to true instead of returning true.
|
Would it be simpler to have a separate helper |
I'm happy to do it this way, but I couldn't convince myself of the earliest place to put such code that is guaranteed not to crash. I can go off and have a look of course, but if you happen to have any ideas that would be great! It would need to be done for both |
OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed! |
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to have a separate helper
setRequiresScalarEpiloguethat's called once up-front withrequiresScalarEpiloguesimply returning the decision as done in multiple other places?OK, I've tried doing this, but I realised there are places where we have to invalidate the decision due to changes in the scalar epilogue status or interleave groups. However, that also means my first version was also incorrect even though all tests passed!
Thanks for checking, does this mean we are missing a new test to cover this case?
llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
Outdated
Show resolved
Hide resolved
cf3af70 to
bf04a43
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
So for the interleave info case I couldn't find any possible scenario where requireScalarEpilogue would change value after invalidating the interleave groups. For example if you have an interleave group with gaps in a block that needs predication and you happen to support masked interleaved accesses, then we'll require a scalar epilogue. But the place that invalidates the interleave info later on requires that you don't support masked interleaved accesses. However, I did find an issue with my original patch when changing the scalar epilogue lowering status. I realised we have a missing test case where we request predication on a loop that requires a scalar epilogue (due to an early exit). We should correctly refuse to tail-fold and fall back on normal vectorisation and jump to the scalar epilogue for the last iteration. This test was broken with my previous patch, which I've now fixed. |
bf04a43 to
9050b71
Compare
|
Rebase + fix code formatting |
…logue There is a flag attached to the loop that requests tail-folding, but this cannot be honoured because the early exit requires a scalar epilogue. So we should fall back on normal vectorisation with a scalar epilogue.
Caching the decision returned by requiresScalarEpilogue means that we can avoid printing out the same debug many times, and also avoids repeating the same calculation. This function will get more complex when we start to reason about more early exit loops, such as in PR llvm#88385. The only problem with this is we sometimes have to invalidate the previous result due to changes in the scalar epilogue status or interleave groups.
9050b71 to
0bb1b0a
Compare
Caching the decision returned by requiresScalarEpilogue means that
we can avoid printing out the same debug many times, and also
avoids repeating the same calculation. This function will get more
complex when we start to reason about more early exit loops, such
as in PR #88385.