-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. #132170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
6fb8134
d189001
9ad5ec3
b06ca2e
76bf30d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4025,11 +4025,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) { | |
| MaxPowerOf2RuntimeVF = std::nullopt; // Stick with tail-folding for now. | ||
| } | ||
|
|
||
| if (MaxPowerOf2RuntimeVF && *MaxPowerOf2RuntimeVF > 0) { | ||
| assert((UserVF.isNonZero() || isPowerOf2_32(*MaxPowerOf2RuntimeVF)) && | ||
| "MaxFixedVF must be a power of 2"); | ||
| unsigned MaxVFtimesIC = | ||
| UserIC ? *MaxPowerOf2RuntimeVF * UserIC : *MaxPowerOf2RuntimeVF; | ||
| auto ScalarEpilogueNeeded = [this, &UserIC](unsigned MaxVF) { | ||
| unsigned MaxVFtimesIC = UserIC ? MaxVF * UserIC : MaxVF; | ||
| ScalarEvolution *SE = PSE.getSE(); | ||
| // Currently only loops with countable exits are vectorized, but calling | ||
| // getSymbolicMaxBackedgeTakenCount allows enablement work for loops with | ||
|
|
@@ -4043,13 +4040,41 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) { | |
| const SCEV *Rem = SE->getURemExpr( | ||
| SE->applyLoopGuards(ExitCount, TheLoop), | ||
| SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC)); | ||
| if (Rem->isZero()) { | ||
| return Rem->isZero(); | ||
| }; | ||
|
|
||
| if (MaxPowerOf2RuntimeVF > 0) { | ||
| assert((UserVF.isNonZero() || isPowerOf2_32(*MaxPowerOf2RuntimeVF)) && | ||
| "MaxFixedVF must be a power of 2"); | ||
| if (ScalarEpilogueNeeded(*MaxPowerOf2RuntimeVF)) { | ||
| // Accept MaxFixedVF if we do not have a tail. | ||
| LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n"); | ||
| return MaxFactors; | ||
| } | ||
| } | ||
|
|
||
| auto ExpectedTC = getSmallBestKnownTC(PSE, TheLoop); | ||
| if (ExpectedTC && ExpectedTC <= TTI.getMinTripCountTailFoldingThreshold()) { | ||
| if (MaxPowerOf2RuntimeVF > 0) { | ||
| // If we have a low-trip-count, and the fixed-width VF is known to divide | ||
| // the trip count but the scalable factor does not, use the fixed-width | ||
| // factor in preference to allow the generation of a non-predicated loop. | ||
| if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedLowTripLoop && | ||
| ScalarEpilogueNeeded(MaxFactors.FixedVF.getFixedValue())) { | ||
| LLVM_DEBUG(dbgs() << "LV: Picking a fixed-width so that no tail will " | ||
| "remain for any chosen VF.\n"); | ||
| MaxFactors.ScalableVF = ElementCount::getScalable(0); | ||
| return MaxFactors; | ||
| } | ||
| } | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you re-add the debug output that we had before, i.e. something like:
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reportVectorizationFailure will print the message it is reporting to dbgs() too. It didn't seem necessary to print the same info twice. It will print
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK yeah fair enough, but I think we're still missing the original
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like it gets emitted in one of the parent functions. None of the other returns from this function use reportVectorizationFailure will emit the -Rpass=analysis output. The -Rpass=missed remark will be emitted by the VF now being scalar |
||
| reportVectorizationFailure( | ||
| "The trip count is below the minial threshold value.", | ||
| "loop trip count is too low, avoiding vectorization", "LowTripCount", | ||
| ORE, TheLoop); | ||
| return FixedScalableVFPair::getNone(); | ||
| } | ||
|
|
||
| // If we don't know the precise trip count, or if the trip count that we | ||
| // found modulo the vectorization factor is not zero, try to fold the tail | ||
| // by masking. | ||
|
|
@@ -10597,26 +10622,15 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |
| if (Hints.getForce() == LoopVectorizeHints::FK_Enabled) | ||
| LLVM_DEBUG(dbgs() << " But vectorizing was explicitly forced.\n"); | ||
| else { | ||
| if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) { | ||
| LLVM_DEBUG(dbgs() << "\n"); | ||
| // Predicate tail-folded loops are efficient even when the loop | ||
| // iteration count is low. However, setting the epilogue policy to | ||
| // `CM_ScalarEpilogueNotAllowedLowTripLoop` prevents vectorizing loops | ||
| // with runtime checks. It's more effective to let | ||
| // `isOutsideLoopWorkProfitable` determine if vectorization is | ||
| // beneficial for the loop. | ||
| if (SEL != CM_ScalarEpilogueNotNeededUsePredicate) | ||
| SEL = CM_ScalarEpilogueNotAllowedLowTripLoop; | ||
| } else { | ||
| LLVM_DEBUG(dbgs() << " But the target considers the trip count too " | ||
| "small to consider vectorizing.\n"); | ||
| reportVectorizationFailure( | ||
| "The trip count is below the minial threshold value.", | ||
| "loop trip count is too low, avoiding vectorization", | ||
| "LowTripCount", ORE, L); | ||
| Hints.emitRemarkWithHints(); | ||
| return false; | ||
| } | ||
| LLVM_DEBUG(dbgs() << "\n"); | ||
| // Predicate tail-folded loops are efficient even when the loop | ||
| // iteration count is low. However, setting the epilogue policy to | ||
| // `CM_ScalarEpilogueNotAllowedLowTripLoop` prevents vectorizing loops | ||
| // with runtime checks. It's more effective to let | ||
| // `isOutsideLoopWorkProfitable` determine if vectorization is | ||
| // beneficial for the loop. | ||
| if (SEL != CM_ScalarEpilogueNotNeededUsePredicate) | ||
| SEL = CM_ScalarEpilogueNotAllowedLowTripLoop; | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns true if no scalar epilogue is needed, so the name should be adjusted?