-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Make some optimizations to the CMake build system #161981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-third-party-benchmark Author: Christopher Bate (christopherbate) ChangesI profiled initial CMake configuration and generation (Ninja) steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the command shown below. Based on the profile, I then implemented a number of optimizations. All the optimizations are hid behind extra flags which are set to have no change with existing logic (except for
Initial time of -- Configuring done (17.8s) After all below optimizations: -- Configuring done (13.4s) With a "toolchain check cache" (explained below): -- Configuring done (8.2s) There's definitely room for more optimizations, I think <10sec end-to-end for this command is definitely doable. Most changes have a small impact. It's the gradual creep of inefficiencies that have added up over time to make the system less efficient than it could be. Command tested:
To enable optimizations, set
Optimizations:Optimize
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements several CMake build system optimizations to reduce LLVM configuration and generation time from 17.8s/6.9s to 8.2s/4.7s with the toolchain check cache enabled. The optimizations include reducing expensive CMake checks, making LIT convenience targets optional, and caching toolchain verification results.
Key changes:
- Introduces new optional flags to control expensive operations:
LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
,LLVM_ENABLE_LIT_CONVENIENCE_TARGETS
,BENCHMARK_ENABLE_EXPENSIVE_CMAKE_CHECKS
, andLLVM_TOOLCHAIN_CHECK_CACHE
- Optimizes linker flag checks by moving them out of per-library call paths
- Adds toolchain check caching mechanism to avoid repeated compilation tests
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
third-party/benchmark/CMakeLists.txt | Adds optimization to skip expensive compiler flag checks when using GCC-compatible compilers |
llvm/cmake/modules/LLVMProcessSources.cmake | Guards expensive file globbing operations behind new configuration options |
llvm/cmake/modules/LLVMCacheSnapshot.cmake | New module for capturing and managing CMake cache snapshots for toolchain checks |
llvm/cmake/modules/AddLLVM.cmake | Moves expensive linker flag checks out of per-library functions and adds option to disable LIT convenience targets |
llvm/CMakeLists.txt | Adds new configuration options and implements toolchain check caching logic |
|
||
# This adds .td and .h files to the Visual Studio solution: |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition logic has changed from always executing to conditional execution based on MSVC OR LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
. This could break builds on non-MSVC platforms where headers are needed but LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
is OFF. Consider documenting this behavioral change or adding a comment explaining when this optimization is safe.
# This adds .td and .h files to the Visual Studio solution: | |
# This adds .td and .h files to the Visual Studio solution. | |
# NOTE: The following conditional logic only adds these files for MSVC or when | |
# LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS is enabled. On non-MSVC platforms, if | |
# LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS is OFF, these files will NOT be added. | |
# This optimization is intended to reduce solution clutter for non-MSVC builds, | |
# but may break IDE integration or developer workflows on platforms that expect | |
# these files to be present. If you encounter issues with missing headers or | |
# .td files in your IDE or build system, consider enabling | |
# LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS or revisiting this logic. |
Copilot uses AI. Check for mistakes.
# Pack "NAME=VALUE (TYPE=...)" for each new cache entry | ||
set(_pairs "") | ||
foreach(_k IN LISTS _new) | ||
if(NOT "${_k}" MATCHES "^((C|CXX)_SUPPORTS|HAVE_|GLIBCXX_USE|SUPPORTS_FVISI)") |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern SUPPORTS_FVISI
appears to be incomplete or a typo. This likely should be SUPPORTS_FVISIBILITY
or similar. The incomplete pattern may not match intended cache variables.
if(NOT "${_k}" MATCHES "^((C|CXX)_SUPPORTS|HAVE_|GLIBCXX_USE|SUPPORTS_FVISI)") | |
if(NOT "${_k}" MATCHES "^((C|CXX)_SUPPORTS|HAVE_|GLIBCXX_USE|SUPPORTS_FVISIBILITY)") |
Copilot uses AI. Check for mistakes.
I profiled initial CMake configuration and generation (Ninja) steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the command shown below. Based on the profile, I then implemented a number of optimizations. Initial time of `cmake` command @ 679d2b2: -- Configuring done (17.8s) -- Generating done (6.9s) After all below optimizations: -- Configuring done (12.8s) -- Generating done (4.7s) With a "toolchain check cache" (explained below): -- Configuring done (6.9s) -- Generating done (4.3s) There's definitely room for more optimizations -- another 20% at least. Most changes have a small impact. It's the gradual creep of inefficiencies that have added up over time to make the system less efficient than it could be. Command tested: ``` cmake -G Ninja -S llvm -B ${buildDir} \ -DLLVM_ENABLE_PROJECTS="mlir;clang" \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_CCACHE_BUILD=ON \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_LINKER=lld \ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DLLVM_TOOLCHAIN_CHECK_CACHE=${PWD}/.toolchain-check-cache.cmake \ --fresh ``` ## Optimizations: ### Optimize `check_linker_flag` calls In `AddLLVM.cmake`, there were a couple places where we call `check_linker_flag` every time `llvm_add_library` is called. Even in non-initial cmake configuration runs, this carries unreasonable overhead. Change: Host (CheckLinkerFlag)` in AddLLVM.cmake and optimize placement of `check_linker_flag` calls so that they are only made once. Impact: - <1 sec ### Make `add_lit_testsuites` optional The function `add_lit_testsuites` is used to recursively populate a set of convenience targets that run a filtered portion of a LIT test suite. So instead of running `check-mlir` you can run `check-mlir-dialect`. These targets are built recursively for each subdirectory (e.g. `check-mlir-dialect-tensor`, `check-mlir-dialect-math`, etc.). This call has quite a bit of overhead, especially for the main LLVM LIT test suite. Personally I use a combination of `ninja -C build check-mlir-build-only` and `llvm-lit` directly to run filtered portions of the MLIR LIT test suite, but I can imagine that others depend on these filtered targets. Change: Introduce a new option `LLVM_ENABLE_LIT_CONVENIENCE_TARGETS` which defaults to `ON`. When set to `OFF`, the function `add_lit_testsuites` just becomes a no-op. It's possible that we could also just improve the performance of `add_lit_testsuites` directly, but I didn't pursue this. Impact: ~1-2sec ### Reduce `file(GLOB)` calls in `LLVMProcessSources.cmake` The `llvm_process_sources` call is made whenver the `llvm_add_library` function is called. It makes several `file(GLOB)` calls, which can be expensive depending on the underlying filesystem/storage. The function globs for headers and TD files to add as sources to the target, but the comments suggest that this is only necessary for MSVC. In addition, it calls `llvm_check_source_file_list` to check that no source files in the directory are unused unless `PARTIAL_SOURCES_INTENDED` is set, which incurs another `file(GLOB)` call. Changes: Guard the `file(GLOB)` calls for populating header sources behind `if(MSVC)`. Only do the `llvm_check_source_file_list` check if a new option `LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS` is set to `ON`. Impact: depends on system. On my local workstation, impact is minimal. On another remote server I use, impact is much larger. ### Optimize initial symbol/flag checks made in `config-ix.cmake` and `HandleLLVMOptions.cmake` The `config-ix.cmake` and `HandleLLVMOptions.cmake` files make a number of calls to compile C/C++ programs in order to verify the precense of certain symbols or whether certain compiler flags are supported. These checks have the biggest impact on an initial `cmake` configuration time. I propose an "opt in" approach for amortizing these checks using a special generated CMake cache file as directed by the developer. An option `LLVM_TOOLCHAIN_CHECK_CACHE` is introduced. It should be set to a path like `-DLLVM_TOOLCHAIN_CHECK_CACHE=$PWD/.toolchain-check-cache.cmake`. Before entering the `config-ix.cmake` and `HandleLLVMOptions.cmake` files, if the `LLVM_TOOLCHAIN_CHECK_CACHE` option is set and exists, include that file to pre-populate cache variables. Otherwise, we save the current set of CMake cache variables names. After calling the `config-ix|HandleLLVMOptions` files, if the `LLVM_TOOLCHAIN_CHECK_CACHE` option is set but does not exist, check what new CMake cache variables were set by those scripts. Filter these variables by whether they are likely cache variables supporting symbol/flag checks (e.g. `CXX_SUPPORTS_.*|HAVE_.*` etc) and write the file to set all these cache variables to their current values. This allows a developer to obviate any subsequent checks, even in initial `cmake` configuration runs. The correctness depends on the developer knowing when it is invalid (e.g. they change toolchains or platforms) and us suddenly not changing the meaning of `CXX_SUPPORTS_SOME_FLAG` to correspond to a different flag. It could be extended the cache file to store a key used to check whether to regenerate the cache, but I didn't go there. Impact: Trivial overhead for cache generation, ~5sec reduction in initial config time. ### Reduce overhead of embedded Google Benchmark configuration Note: technically this could be lumped in with the above if we expanded scope of before/after change that the `LLVM_TOOLCHAIN_CHECK_CACHE` covers. GoogleBenchmark is embedded under the `third-party/benchmark` directory. Its CMake script does a compilation check for each flag that it wants to populate (even for `-Wall`). In comparison, LLVM's HandleLLVMOptions.cmake script takes a faster approach by skipping as many compilation checks as possible if the cache variable `LLVM_COMPILER_IS_GCC_COMPATIBLE` is true. Changes: Use `LLVM_COMPILER_IS_GCC_COMPATIBLE` to skip as many compilation checks as possible in GoogleBenchmark. Impact: ~1-2sec
27197d8
to
18197ae
Compare
Please separate the individual changes here into separate PRs. |
will do |
https://github.com/google/benchmark is a third-party project, and we prefer to minimize differences with upstream. Please submit changes there first. |
I profiled initial CMake configuration and generation (Ninja) steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the command shown below. Based on the profile, I then implemented a number of optimizations.
All the optimizations are hid behind extra flags which are set to have no change with existing logic (except for
check_linker_flag
optimization below and the optimization of GoogleBenchmark flags).Initial time of
cmake
command @ 679d2b2 on my workstation:-- Configuring done (17.8s)
-- Generating done (6.9s)
After all below optimizations:
-- Configuring done (12.8s)
-- Generating done (4.7s)
With a "toolchain check cache" (explained below):
-- Configuring done (6.9s)
-- Generating done (4.3s)
There's definitely room for more optimizations, I think <10sec end-to-end for this command is definitely doable.
Most changes have a small impact. It's the gradual creep of inefficiencies that have added up over time to make the system less efficient than it could be.
Command tested:
To enable new optimal optimizations optimizations, set
Optimizations:
Optimize
check_linker_flag
callsIn
AddLLVM.cmake
, there were a couple places where we callcheck_linker_flag
every timellvm_add_library
is called. Even in non-initial cmake configuration runs, this carries unreasonable overhead.Change: Host (CheckLinkerFlag)
in AddLLVM.cmake and optimize placement of
check_linker_flag` calls so that they are only made once.Impact: - <1 sec
Make
add_lit_testsuites
optionalThe function
add_lit_testsuites
is used torecursively populate a set of convenience targets that run a filtered portion of a LIT test suite. So instead of running
check-mlir
you can runcheck-mlir-dialect
. These targets are built recursively for each subdirectory (e.g.check-mlir-dialect-tensor
,check-mlir-dialect-math
, etc.).This call has quite a bit of overhead, especially for the main LLVM LIT test suite.
Personally I use a combination of
ninja -C build check-mlir-build-only
andllvm-lit
directly to run filtered portions of the MLIR LIT test suite, but I can imagine that others depend on these filtered targets.Change: Introduce a new option
LLVM_ENABLE_LIT_CONVENIENCE_TARGETS
which defaults toON
. When set toOFF
, the functionadd_lit_testsuites
just becomes a no-op. It's possible that we could also just improve the performance ofadd_lit_testsuites
directly, but I didn't pursue this.Impact: ~1-2sec
Reduce
file(GLOB)
calls inLLVMProcessSources.cmake
The
llvm_process_sources
call is made whenver thellvm_add_library
function is called. It makes severalfile(GLOB)
calls, which can be expensive depending on the underlying filesystem/storage. The function globs for headers and TD files to add as sources to the target, but the comments suggest that this is only necessary for MSVC. In addition, it callsllvm_check_source_file_list
to check that no source files in the directory are unused unlessPARTIAL_SOURCES_INTENDED
is set, which incurs anotherfile(GLOB)
call.Changes: Guard the
file(GLOB)
calls for populating header sources behindif(MSVC)
. Only do thellvm_check_source_file_list
check if a new optionLLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
is set toON
.Impact: depends on system. On my local workstation, impact is minimal. On another remote server I use, impact is much larger.
Optimize initial symbol/flag checks made in
config-ix.cmake
andHandleLLVMOptions.cmake
The
config-ix.cmake
andHandleLLVMOptions.cmake
files make a number of calls to compile C/C++ programs in order to verify the precense of certain symbols or whether certain compiler flags are supported.These checks have the biggest impact on an initial
cmake
configuration time.I propose an "opt in" approach for amortizing these checks using a special generated CMake cache file as directed by the developer.
An option
LLVM_TOOLCHAIN_CHECK_CACHE
is introduced. It should be set to a path like-DLLVM_TOOLCHAIN_CHECK_CACHE=$PWD/.toolchain-check-cache.cmake
.Before entering the
config-ix.cmake
andHandleLLVMOptions.cmake
files, if theLLVM_TOOLCHAIN_CHECK_CACHE
option is set and exists, include that file to pre-populate cache variables.Otherwise, we save the current set of CMake cache variables names. After calling the
config-ix|HandleLLVMOptions
files, if theLLVM_TOOLCHAIN_CHECK_CACHE
option is set but does not exist, check what new CMake cache variables were set by those scripts. Filter these variables by whether they are likely cache variablessupporting symbol/flag checks (e.g.
CXX_SUPPORTS_.*|HAVE_.*
etc) and write the file to set all these cache variables to their current values.This allows a developer to obviate any subsequent checks, even in initial
cmake
configuration runs. The correctness depends on the developer knowing when it is invalid (e.g. they change toolchains or platforms) and us suddenly not changing the meaning ofCXX_SUPPORTS_SOME_FLAG
to correspond to a different flag.It could be extended the cache file to store a key used to check whether to regenerate the cache, but I didn't go there.
Impact: Trivial overhead for cache generation, ~5sec reduction in initial config time.
Reduce overhead of embedded Google Benchmark configuration
Note: technically this could be lumped in with the above if we expanded scope of before/after change that the
LLVM_TOOLCHAIN_CHECK_CACHE
covers.You can disable google benchmark entirely with
LLVM_INCLUDE_BENCHMARK=OFF
, but most CI systems don't set that.GoogleBenchmark is embedded under the
third-party/benchmark
directory. Its CMake script does a compilation check for each flag that it wants to populate (even for-Wall
). In comparison, LLVM's HandleLLVMOptions.cmake script takes a faster approach by skipping as many compilation checks as possible if the cache variableLLVM_COMPILER_IS_GCC_COMPATIBLE
is true.Changes: Use
LLVM_COMPILER_IS_GCC_COMPATIBLE
to skip as many compilation checks as possible in GoogleBenchmark.Impact: ~1-2sec