Make some optimizations to the CMake build system #161981

christopherbate · 2025-10-04T22:52:11Z

I profiled initial CMake configuration and generation (Ninja) steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the command shown below. Based on the profile, I then implemented a number of optimizations.

All the optimizations are hid behind extra flags which are set to have no change with existing logic (except for check_linker_flag optimization below and the optimization of GoogleBenchmark flags).

LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
LLVM_ENABLE_LIT_CONVENIENCE_TARGETS
LLVM_TOOLCHAIN_CHECK_CACHE

Initial time of cmake command @ 679d2b2 on my workstation:

-- Configuring done (17.8s)
-- Generating done (6.9s)

After all below optimizations:

-- Configuring done (12.8s)
-- Generating done (4.7s)

With a "toolchain check cache" (explained below):

-- Configuring done (6.9s)
-- Generating done (4.3s)

There's definitely room for more optimizations, I think <10sec end-to-end for this command is definitely doable.

Most changes have a small impact. It's the gradual creep of inefficiencies that have added up over time to make the system less efficient than it could be.

Command tested:

cmake -G Ninja -S llvm -B ${buildDir} \
		-DLLVM_ENABLE_PROJECTS="mlir;clang" \
		-DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
		-DCMAKE_BUILD_TYPE=RelWithDebInfo \
		-DLLVM_ENABLE_ASSERTIONS=ON \
		-DLLVM_CCACHE_BUILD=ON \
		-DBUILD_SHARED_LIBS=ON \
		-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_LINKER=lld \
		-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
		-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
		--fresh

To enable new optimal optimizations optimizations, set

-DLLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS=OFF
-DLLVM_ENABLE_LIT_CONVENIENCE_TARGETS=OFF
-DLLVM_TOOLCHAIN_CHECK_CACHE=$(pwd)/toolchain-check-cache.cmake

Optimizations:

Optimize `check_linker_flag` calls

In AddLLVM.cmake, there were a couple places where we call check_linker_flag every time llvm_add_library is called. Even in non-initial cmake configuration runs, this carries unreasonable overhead.

Change: Host (CheckLinkerFlag)in AddLLVM.cmake and optimize placement ofcheck_linker_flag` calls so that they are only made once.

Impact: - <1 sec

Make `add_lit_testsuites` optional

The function add_lit_testsuites is used to
recursively populate a set of convenience targets that run a filtered portion of a LIT test suite. So instead of running check-mlir you can run check-mlir-dialect. These targets are built recursively for each subdirectory (e.g. check-mlir-dialect-tensor, check-mlir-dialect-math, etc.).

This call has quite a bit of overhead, especially for the main LLVM LIT test suite.

Personally I use a combination of ninja -C build check-mlir-build-only and llvm-lit directly to run filtered portions of the MLIR LIT test suite, but I can imagine that others depend on these filtered targets.

Change: Introduce a new option LLVM_ENABLE_LIT_CONVENIENCE_TARGETS which defaults to ON. When set to OFF, the function add_lit_testsuites just becomes a no-op. It's possible that we could also just improve the performance of add_lit_testsuites directly, but I didn't pursue this.

Impact: ~1-2sec

Reduce `file(GLOB)` calls in `LLVMProcessSources.cmake`

The llvm_process_sources call is made whenver the llvm_add_library function is called. It makes several file(GLOB) calls, which can be expensive depending on the underlying filesystem/storage. The function globs for headers and TD files to add as sources to the target, but the comments suggest that this is only necessary for MSVC. In addition, it calls llvm_check_source_file_list to check that no source files in the directory are unused unless PARTIAL_SOURCES_INTENDED is set, which incurs another file(GLOB) call.

Changes: Guard the file(GLOB) calls for populating header sources behind if(MSVC). Only do the llvm_check_source_file_list check if a new option LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS is set to ON.

Impact: depends on system. On my local workstation, impact is minimal. On another remote server I use, impact is much larger.

Optimize initial symbol/flag checks made in `config-ix.cmake` and `HandleLLVMOptions.cmake`

The config-ix.cmake and HandleLLVMOptions.cmake files make a number of calls to compile C/C++ programs in order to verify the precense of certain symbols or whether certain compiler flags are supported.

These checks have the biggest impact on an initial cmake configuration time.

I propose an "opt in" approach for amortizing these checks using a special generated CMake cache file as directed by the developer.

An option LLVM_TOOLCHAIN_CHECK_CACHE is introduced. It should be set to a path like -DLLVM_TOOLCHAIN_CHECK_CACHE=$PWD/.toolchain-check-cache.cmake.

Before entering the config-ix.cmake and HandleLLVMOptions.cmake files, if the LLVM_TOOLCHAIN_CHECK_CACHE option is set and exists, include that file to pre-populate cache variables.
Otherwise, we save the current set of CMake cache variables names. After calling the config-ix|HandleLLVMOptions files, if the LLVM_TOOLCHAIN_CHECK_CACHE option is set but does not exist, check what new CMake cache variables were set by those scripts. Filter these variables by whether they are likely cache variables
supporting symbol/flag checks (e.g. CXX_SUPPORTS_.*|HAVE_.* etc) and write the file to set all these cache variables to their current values.

This allows a developer to obviate any subsequent checks, even in initial cmake configuration runs. The correctness depends on the developer knowing when it is invalid (e.g. they change toolchains or platforms) and us suddenly not changing the meaning of CXX_SUPPORTS_SOME_FLAG to correspond to a different flag.

It could be extended the cache file to store a key used to check whether to regenerate the cache, but I didn't go there.

Impact: Trivial overhead for cache generation, ~5sec reduction in initial config time.

Reduce overhead of embedded Google Benchmark configuration

Note: technically this could be lumped in with the above if we expanded scope of before/after change that the LLVM_TOOLCHAIN_CHECK_CACHE covers.

You can disable google benchmark entirely with LLVM_INCLUDE_BENCHMARK=OFF, but most CI systems don't set that.

GoogleBenchmark is embedded under the third-party/benchmark directory. Its CMake script does a compilation check for each flag that it wants to populate (even for -Wall). In comparison, LLVM's HandleLLVMOptions.cmake script takes a faster approach by skipping as many compilation checks as possible if the cache variable LLVM_COMPILER_IS_GCC_COMPATIBLE is true.

Changes: Use LLVM_COMPILER_IS_GCC_COMPATIBLE to skip as many compilation checks as possible in GoogleBenchmark.

Impact: ~1-2sec

llvmbot · 2025-10-04T22:52:44Z

@llvm/pr-subscribers-third-party-benchmark

Author: Christopher Bate (christopherbate)

Changes

I profiled initial CMake configuration and generation (Ninja) steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the command shown below. Based on the profile, I then implemented a number of optimizations.

All the optimizations are hid behind extra flags which are set to have no change with existing logic (except for check_linker_flag optimization below):

LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS
LLVM_ENABLE_LIT_CONVENIENCE_TARGETS
BENCHMARK_ENABLE_EXPENSIVE_CMAKE_CHECKS
LLVM_TOOLCHAIN_CHECK_CACHE