Skip to content

Commit f14dfc3

Browse files
fhahnmemfrob
authored andcommitted
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471
1 parent 3eeb90e commit f14dfc3

File tree

5 files changed

+10
-7
lines changed

5 files changed

+10
-7
lines changed

llvm/include/llvm/Transforms/Scalar.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,8 @@ Pass *createLoopUnrollPass(int OptLevel = 2, bool OnlyWhenForced = false,
190190
int Count = -1, int AllowPartial = -1,
191191
int Runtime = -1, int UpperBound = -1,
192192
int AllowPeeling = -1);
193-
// Create an unrolling pass for full unrolling that uses exact trip count only.
193+
// Create an unrolling pass for full unrolling that uses exact trip count only
194+
// and also does peeling.
194195
Pass *createSimpleLoopUnrollPass(int OptLevel = 2, bool OnlyWhenForced = false,
195196
bool ForgetAllSCEV = false);
196197

llvm/include/llvm/Transforms/Scalar/LoopUnrollPass.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ class Function;
2222
class Loop;
2323
class LPMUpdater;
2424

25-
/// Loop unroll pass that only does full loop unrolling.
25+
/// Loop unroll pass that only does full loop unrolling and peeling.
2626
class LoopFullUnrollPass : public PassInfoMixin<LoopFullUnrollPass> {
2727
const int OptLevel;
2828

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ void PassManagerBuilder::addFunctionSimplificationPasses(
458458
if (EnableLoopInterchange)
459459
MPM.add(createLoopInterchangePass()); // Interchange loops
460460

461-
// Unroll small loops
461+
// Unroll small loops and perform peeling.
462462
MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
463463
ForgetAllSCEVInLoopUnroll));
464464
addExtensionsToPM(EP_LoopOptimizerEnd, MPM);
@@ -1072,7 +1072,7 @@ void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
10721072
if (EnableConstraintElimination)
10731073
PM.add(createConstraintEliminationPass());
10741074

1075-
// Unroll small loops
1075+
// Unroll small loops and perform peeling.
10761076
PM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
10771077
ForgetAllSCEVInLoopUnroll));
10781078
PM.add(createLoopDistributePass());

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1301,7 +1301,7 @@ Pass *llvm::createLoopUnrollPass(int OptLevel, bool OnlyWhenForced,
13011301
Pass *llvm::createSimpleLoopUnrollPass(int OptLevel, bool OnlyWhenForced,
13021302
bool ForgetAllSCEV) {
13031303
return createLoopUnrollPass(OptLevel, OnlyWhenForced, ForgetAllSCEV, -1, -1,
1304-
0, 0, 0, 0);
1304+
0, 0, 0, 1);
13051305
}
13061306

13071307
PreservedAnalyses LoopFullUnrollPass::run(Loop &L, LoopAnalysisManager &AM,
@@ -1329,7 +1329,7 @@ PreservedAnalyses LoopFullUnrollPass::run(Loop &L, LoopAnalysisManager &AM,
13291329
OnlyWhenForced, ForgetSCEV, /*Count*/ None,
13301330
/*Threshold*/ None, /*AllowPartial*/ false,
13311331
/*Runtime*/ false, /*UpperBound*/ false,
1332-
/*AllowPeeling*/ false,
1332+
/*AllowPeeling*/ true,
13331333
/*AllowProfileBasedPeeling*/ false,
13341334
/*FullUnrollMaxCount*/ None) !=
13351335
LoopUnrollResult::Unmodified;

llvm/test/Transforms/PhaseOrdering/X86/peel-before-lv-to-enable-vectorization.ll

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ target triple = "x86_64-apple-macosx"
1111

1212
define i32 @test(i32* readonly %p, i32* readnone %q) {
1313
; CHECK-LABEL: define i32 @test(
14-
; CHECK-NOT: vector.body
14+
; CHECK: vector.body:
15+
; CHECK: %index.next = add i64 %index, 8
16+
; CHECK: middle.block:
1517
;
1618
entry:
1719
%cmp.not7 = icmp eq i32* %p, %q

0 commit comments

Comments
 (0)