You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED (#79942)
Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO.
Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT
Clang builds.
Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in
training set is inadequate for collecting sampled profile.
Hardware compatibility: CSSPGO requires synchronized (0-skid) call
and branch stacks, which is only available with Intel PEBS (Sandy
Bridge+),
AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2.
This patch adds support for Intel `br_inst_retired.near_taken:uppp`
event.
Test Plan:
Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake,
e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected
from LLVM build, and BOLT profile collected from Hello World
(instrumentation):
```
cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \
-DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \
-GNinja -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake
ninja stage2-clang-bolt
...
warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
...
[2800/2801] Optimizing Clang with BOLT
BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile
1377639 : taken branches (-42.1%)
```
Performance testing with Clang:
- Setup: Clang-BOLT testing harness
aaupov/llvm-devmtg-2022@9f2b46f
- CSSPGO training: building LLVM,
- InstrPGO training: building Hello World,
- BOLT training: building Hello World, instrumentation,
- benchmark: building small LLVM tool (not),
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- Results, wall time, lower is better
- Baseline (bootstrapped build): 10.36s,
- InstrPGO + ThinLTO: 9.34s,
- CSSPGO + ThinLTO: 8.85s.
- BOLT results, for reference:
- Baseline: 9.09s,
- InstrPGO + ThinLTO: 9.09s,
- CSSPGO + ThinLTO: 8.58s.
---------
Co-authored-by: Matthias Braun <[email protected]>
set(CLANG_PGO_TRAINING_DATA_SOURCE_DIR OFFCACHESTRING"Path to source directory containing cmake project with source files to use for generating pgo data")
7
7
set(CLANG_PGO_TRAINING_DEPS ""CACHESTRING"Extra dependencies needed to build the PGO training data.")
0 commit comments