-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[clang][perf-training] Fix profiling with -DCLANG_BOLT=perf #119117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fixes the llvm-support build that generates the profile data. However, I'm wondering if maybe we should disable llvm-suppot and only run hello-world with -DCLANG_BOLT=perf. The bolt optimizations with perf only give about a 3% performance increase (although maybe with hw counters this would be better) and it takes a very long time to convert all the perf profiles to the fdata format.
|
@llvm/pr-subscribers-clang Author: Tom Stellard (tstellar) ChangesThis fixes the llvm-support build that generates the profile data. However, I'm wondering if maybe we should disable llvm-suppot and only run hello-world with -DCLANG_BOLT=perf. The bolt optimizations with perf only give about a 3% performance increase (although maybe with hw counters this would be better) and it takes a very long time to convert all the perf profiles to the fdata format. Full diff: https://github.com/llvm/llvm-project/pull/119117.diff 2 Files Affected:
diff --git a/clang/utils/perf-training/bolt.lit.cfg b/clang/utils/perf-training/bolt.lit.cfg
index 1d0cf9a8a17a8e..7687a5a5cd2e68 100644
--- a/clang/utils/perf-training/bolt.lit.cfg
+++ b/clang/utils/perf-training/bolt.lit.cfg
@@ -8,21 +8,32 @@ import subprocess
clang_bolt_mode = config.clang_bolt_mode.lower()
clang_binary = "clang"
-perf_wrapper = f"{config.python_exe} {config.perf_helper_dir}/perf-helper.py perf "
+perf_wrapper = f"{config.python_exe} {config.perf_helper_dir}/perf-helper.py perf"
if clang_bolt_mode == "instrument":
perf_wrapper = ""
clang_binary = config.clang_bolt_name
elif clang_bolt_mode == "lbr":
- perf_wrapper += " --lbr -- "
+ perf_wrapper += " --lbr --"
elif clang_bolt_mode == "perf":
- perf_wrapper += " -- "
+ perf_wrapper += " --"
else:
assert 0, "Unsupported CLANG_BOLT_MODE variable"
-config.clang = perf_wrapper + os.path.realpath(
+clang_nowrapper = os.path.realpath(
lit.util.which(clang_binary, config.clang_tools_dir)
).replace("\\", "/")
+config.clang = f'{perf_wrapper} {clang_nowrapper}'
+
+# We need to limit the number of build jobs with perf in order to avoid this
+# error:
+#
+# | Permission error mapping pages.
+# | Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
+# | or try again with a smaller value of -m/--mmap_pages.
+ninja_args = ""
+if ninja_args != "instrument":
+ ninja_args = "-j1"
config.name = "Clang Perf Training"
config.suffixes = [
@@ -52,3 +63,6 @@ config.substitutions.append(("%test_root", config.test_exec_root))
config.substitutions.append(('%cmake_generator', config.cmake_generator))
config.substitutions.append(('%cmake', config.cmake_exe))
config.substitutions.append(('%llvm_src_dir', config.llvm_src_dir))
+config.substitutions.append(('%perf_cmake_compiler_launcher', perf_wrapper.replace(' ', ';')))
+config.substitutions.append(('%nowrapper_clang', clang_nowrapper))
+config.substitutions.append(('%ninja_args', ninja_args))
diff --git a/clang/utils/perf-training/llvm-support/build.test b/clang/utils/perf-training/llvm-support/build.test
index f29a594c846869..1f4d76502a3757 100644
--- a/clang/utils/perf-training/llvm-support/build.test
+++ b/clang/utils/perf-training/llvm-support/build.test
@@ -1,2 +1,2 @@
-RUN: %cmake -G %cmake_generator -B %t -S %llvm_src_dir -DCMAKE_C_COMPILER=%clang -DCMAKE_CXX_COMPILER=%clang -DCMAKE_CXX_FLAGS="--driver-mode=g++" -DCMAKE_BUILD_TYPE=Release
-RUN: %cmake --build %t -v --target LLVMSupport
+RUN: %cmake -G %cmake_generator -B %t -S %llvm_src_dir -DCMAKE_CXX_COMPILER_LAUNCHER="%perf_cmake_compiler_launcher" -DCMAKE_C_COMPILER="%nowrapper_clang" -DCMAKE_CXX_COMPILER="%nowrapper_clang" -DCMAKE_CXX_FLAGS="--driver-mode=g++" -DCMAKE_BUILD_TYPE=Release
+RUN: %cmake --build %t %ninja_args -v --target LLVMSupport
|
1760788 to
5d13b69
Compare
This reverts commit 5d13b69.
aaupov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Existing perf training is inadequate for collecting sampled profile – we simply don't get enough samples, and no-LBR mode further drops the performance. If we wanted to pursue perf sampling further, we'd need to extend perf training with either building LLVM subtargets or llvm-test-suite.
Yes, no-LBR mode has very limited benefit due to missing edge counts.
As discussed on Discord, we may be able to reduce the time by dropping |
|
@aaupov When we build llvm-support there is one perf.data file generate for each cpp file compiled, so we end up without about 150 files. Is there some way to merge those together before running perf2bolt? |
I see. The best way would be to run perf once so that all clang invocations are under it. If llvm-support is configured as cmake rule for llvm external project, perf wrapper could be set as CMAKE_COMPILER_LAUNCHER. |
Even with using CMAKE_CXX_COMPILER_LAUNCHER, we still get one invocation of perf for each invocation of clang. I just tested out |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/11269 Here is the relevant piece of the build log for the reference |
This fixes the llvm-support build that generates the profile data, and wraps the whole
cmake --buildcommand with perf instead of wrapping each individual clang invocation. This limits the number of profile files generated and reduces the time spent running perf2bolt.