-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LV] Remove assertions in IV overflow check #115705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: opt -passes=loop-vectorize \ | ||
| ; RUN: -force-tail-folding-style=data-with-evl \ | ||
| ; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \ | ||
| ; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s | ||
|
|
||
| ; If we know the IV will never overflow then we can skip the IV overflow check | ||
|
|
||
| define void @f(ptr %p, i64 %tc) vscale_range(2, 1024) { | ||
|
||
| ; CHECK-LABEL: define void @f( | ||
| ; CHECK-SAME: ptr [[P:%.*]], i64 [[TC:%.*]]) #[[ATTR0:[0-9]+]] { | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: [[GUARD:%.*]] = icmp ugt i64 [[TC]], 1024 | ||
| ; CHECK-NEXT: br i1 [[GUARD]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]] | ||
| ; CHECK: [[LOOP_PREHEADER]]: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = sub i64 -1, [[TC]] | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2 | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 2 | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = sub i64 [[TMP4]], 1 | ||
| ; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TC]], [[TMP5]] | ||
| ; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP4]] | ||
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] | ||
| ; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 2 | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
| ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ] | ||
| ; CHECK-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ] | ||
| ; CHECK-NEXT: [[AVL:%.*]] = sub i64 [[TC]], [[EVL_BASED_IV]] | ||
| ; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true) | ||
| ; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[EVL_BASED_IV]], 0 | ||
| ; CHECK-NEXT: [[TMP10:%.*]] = getelementptr i64, ptr [[P]], i64 [[TMP9]] | ||
| ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr i64, ptr [[TMP10]], i32 0 | ||
| ; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]]) | ||
| ; CHECK-NEXT: [[VP_OP:%.*]] = call <vscale x 2 x i64> @llvm.vp.add.nxv2i64(<vscale x 2 x i64> [[VP_OP_LOAD]], <vscale x 2 x i64> shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]]) | ||
| ; CHECK-NEXT: call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[VP_OP]], ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]]) | ||
| ; CHECK-NEXT: [[TMP12:%.*]] = zext i32 [[TMP8]] to i64 | ||
| ; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[TMP12]], [[EVL_BASED_IV]] | ||
| ; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP7]] | ||
| ; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] | ||
| ; CHECK-NEXT: br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: br i1 true, label %[[EXIT_LOOPEXIT:.*]], label %[[SCALAR_PH]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[LOOP_PREHEADER]] ] | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], %[[LOOP]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ] | ||
| ; CHECK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[P]], i64 [[I]] | ||
| ; CHECK-NEXT: [[X:%.*]] = load i64, ptr [[GEP]], align 8 | ||
| ; CHECK-NEXT: [[Y:%.*]] = add i64 [[X]], 1 | ||
| ; CHECK-NEXT: store i64 [[Y]], ptr [[GEP]], align 8 | ||
| ; CHECK-NEXT: [[I_NEXT]] = add i64 [[I]], 1 | ||
| ; CHECK-NEXT: [[DONE:%.*]] = icmp eq i64 [[I_NEXT]], [[TC]] | ||
| ; CHECK-NEXT: br i1 [[DONE]], label %[[EXIT_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]] | ||
| ; CHECK: [[EXIT_LOOPEXIT]]: | ||
| ; CHECK-NEXT: br label %[[EXIT]] | ||
| ; CHECK: [[EXIT]]: | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
| entry: | ||
| %guard = icmp ugt i64 %tc, 1024 | ||
| br i1 %guard, label %exit, label %loop | ||
| loop: | ||
| %i = phi i64 [%i.next, %loop], [0, %entry] | ||
| %gep = getelementptr i64, ptr %p, i64 %i | ||
| %x = load i64, ptr %gep | ||
| %y = add i64 %x, 1 | ||
| store i64 %y, ptr %gep | ||
| %i.next = add i64 %i, 1 | ||
| %done = icmp eq i64 %i.next, %tc | ||
| br i1 %done, label %exit, label %loop | ||
| exit: | ||
| ret void | ||
| } | ||
| ;. | ||
| ; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]} | ||
| ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1} | ||
| ; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"} | ||
| ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]} | ||
| ;. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch to add the NDEBUG code landed only recently - #111310 by @fhahn. It looks like this assert was added for a reason - is it possible that
isIndvarOverflowCheckKnownFalseis returning the incorrect answer for the EVL case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the review comment that led to the assert being added: #111310 (comment) cc @ayalz
I'm not sure if that assert always holds though, but I could be missing something.
isIndvarOverflowCheckKnownFalsedoesn't have the information from SCEVs nor the loop guards, which seems to be where the discrepancy between it and the assert lies. We could maybe instead move the SCEV stuff into it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is interesting and thanks for explaining and putting up a fix! So the test you've added shows
isIndvarOverflowCheckKnownFalseis returning the wrong answer, or at least an answer that's inconsistent with theSE.isKnownPredicatecall. Once we've applied the loop guards we know that the check is false. So it looks like there are a few choices:isIndvarOverflowCheckKnownFalsewill be consistent with the call toSE.isKnownPredicate. That might be a much smaller change than this patch I think? Or,isIndvarOverflowCheckKnownFalseto apply loop guards, which I think would require changing ScalarEvolution::getSmallConstantMaxTripCount to apply loop guards. It looks likeScalarEvolution::getSmallConstantTripMultiplealready does something similar. Or,The only problem with 3 is that the code is now more complex and it doesn't look like there are any tests for the
CheckMinIters = Builder.getTrue();case you've added.I personally feel like 2 is the most powerful change because you've highlighted a problem in how we estimate the small constant max trip count. If the vectoriser is always going to apply loop guards anyway, then we should do the same when estimating the trip count. However, I realise it would expand the scope of this patch quite a bit and I perfectly understand if you don't have time to do this right now! What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting case. For the loop in the test case, the induction can overflow if
%tc == 0I think, soisIndvarOverflowCheckKnownFalseshould return false in general?I think
getSmallConstantMaxTripCountalready should make use of loop guards in this case, but we cannot compute a tighter maximum because%tcmay be 0, in which case we would wrap.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and pointers! 2 seems good to me too, I'm happy to give it a stab.
Oh woops, I think the exit condition in the test case should have been
icmp uge.But if we did emit the overflow check would it even have caught it at runtime? If
%tc == 0then I think(UMax - %tc) < (VF * UF)will always return false.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at teaching
isIndvarOverflowCheckKnownFalsebut as @fhahn said I think it's already doing the correct thing.getConstantMaxBackedgeTakenCountwas returning 0xFFFFFFFFFFFFFFFF due to the overflow case at%tc == 0. ButgetConstantTripCountthrows it away because it only considers trip counts that fit into 32 bits.The SCEV assertion uses the full 64 bits of the TC type though, and in combination with what I think is a separate coincidental issue that
(UMax - %tc) < (VF * UF)when%tc == 0, it means that the SCEV is known whilstisIndvarOverflowCheckKnownFalseis unknown.I couldn't think of an easy way of updating the assertion to be in sync with
isIndvarOverflowCheckKnownFalse, so I've reworked this PR to just remove the assert and nothing else for now.I've added some new cases to show things that I think could be split off from this PR:
@overflow_at_0is the case that originally crashed with the assertion, but it also shows how the overflow check seems to let%tc == 0slip through, which maybe is something that needs fixed in a separate PR?@no_overflow_at_0shows a case whereisIndvarOverflowCheckKnownFalseworks and knows that the maximum trip count is 1025, so nothing needs to be done here@trip_count_max_1024shows a case where we currently can't calculate the max trip count, i.e. the debug output doesn't contain:LV: Found maximum trip count: N, and soisIndvarOverflowCheckKnownFalsealso returns false. I think in a separate PR we can teachcomputeMaxBECountForLTto apply the loop guards, which seems to fix this.