Skip to content

Commit 41e8b9f

Browse files
[Pipelines] Additional unrolling in LTO (#536)
Some workloads require specific sequences of events to happen to fully simplify. This adds an extra full unrolling pass to help these cases on the cores with branch predictors. It helps produce simplified loops, which can then be SROA'd allowing further simplification, which can be important for performance. The feature adds extra compile time to get extra performance and is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default). Original patch by David Green ([email protected])
1 parent 557d794 commit 41e8b9f

File tree

5 files changed

+67
-23
lines changed

5 files changed

+67
-23
lines changed

OmaxLTO.cfg

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
-flto=full \
22
-fvirtual-function-elimination \
3-
-fwhole-program-vtables
3+
-fwhole-program-vtables \
4+
-mllvm -extra-LTO-loop-unroll=true

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ and/or increased memory usage during linking. Some of the options in the config
172172
corresponding optimisation passes in the [LLVM project](https://github.com/llvm/llvm-project)
173173
to find out more. Users are also encouraged to create their own configs and tune their own
174174
flag parameters.
175+
Information on LLVM Embedded Toolchain for Arm specific optimization flags is available in [Optimization Flags](https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm/blob/main/docs/optimization-flags.md)
175176

176177
Binary releases of the LLVM Embedded Toolchain for Arm are based on release
177178
branches of the upstream LLVM Project, thus can safely be used with all tools

docs/optimization-flags.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Additional optimization flags
2+
=============================
3+
4+
## Additional loop unroll in the LTO pipeline
5+
In some cases it is benefitial to perform an additional loop unroll pass so that extra information becomes available to later passes, e.g. SROA.
6+
Use cases where this could be beneficial - multiple (N>=4) nested loops.
7+
8+
### Usage:
9+
-mllvm -extra-LTO-loop-unroll=true/false

patches/llvm-project-perf/0000-Placeholder-commit.patch

Lines changed: 0 additions & 22 deletions
This file was deleted.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
From 4adfc5231d2c0182d6278b4aa75eec57648e5dd4 Mon Sep 17 00:00:00 2001
2+
From: Vladi Krapp <[email protected]>
3+
Date: Tue, 3 Sep 2024 14:12:48 +0100
4+
Subject: [Pipelines] Additional unrolling in LTO
5+
6+
Some workloads require specific sequences of events to happen
7+
to fully simplify. This adds an extra full unrolling pass to help these
8+
cases on the cores with branch predictors. It helps produce simplified
9+
loops, which can then be SROA'd allowing further simplification, which
10+
can be important for performance.
11+
Feature adds extra compile time to get extra performance and
12+
is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default).
13+
14+
Original patch by David Green ([email protected])
15+
---
16+
llvm/lib/Passes/PassBuilderPipelines.cpp | 16 ++++++++++++++++
17+
1 file changed, 16 insertions(+)
18+
19+
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
20+
index 1184123c7710..6dc45d85927a 100644
21+
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
22+
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
23+
@@ -332,6 +332,10 @@ namespace llvm {
24+
extern cl::opt<unsigned> MaxDevirtIterations;
25+
} // namespace llvm
26+
27+
+static cl::opt<bool> LTOExtraLoopUnroll(
28+
+ "extra-LTO-loop-unroll", cl::init(false), cl::Hidden,
29+
+ cl::desc("Perform extra loop unrolling pass to assist SROA"));
30+
+
31+
void PassBuilder::invokePeepholeEPCallbacks(FunctionPassManager &FPM,
32+
OptimizationLevel Level) {
33+
for (auto &C : PeepholeEPCallbacks)
34+
@@ -1940,6 +1944,18 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
35+
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(ArgumentPromotionPass()));
36+
37+
FunctionPassManager FPM;
38+
+
39+
+ if (LTOExtraLoopUnroll) {
40+
+ LoopPassManager OmaxLPM;
41+
+ OmaxLPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
42+
+ /* OnlyWhenForced= */ !PTO.LoopUnrolling,
43+
+ PTO.ForgetAllSCEVInLoopUnroll));
44+
+ FPM.addPass(
45+
+ createFunctionToLoopPassAdaptor(std::move(OmaxLPM),
46+
+ /*UseMemorySSA=*/false,
47+
+ /*UseBlockFrequencyInfo=*/true));
48+
+ }
49+
+
50+
// The IPO Passes may leave cruft around. Clean up after them.
51+
FPM.addPass(InstCombinePass());
52+
invokePeepholeEPCallbacks(FPM, Level);
53+
--
54+
2.34.1
55+

0 commit comments

Comments
 (0)