You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I profiled initial CMake configuration and generation (Ninja)
steps in LLVM-Project with just LLVM, MLIR, and Clang enabled using the
command shown below. Based on the profile, I then implemented
a number of optimizations.
Initial time of `cmake` command @ 679d2b2:
-- Configuring done (17.8s)
-- Generating done (6.9s)
After all below optimizations:
-- Configuring done (12.8s)
-- Generating done (4.7s)
With a "toolchain check cache" (explained below):
-- Configuring done (6.9s)
-- Generating done (4.3s)
There's definitely room for more optimizations -- another 20% at least.
Most changes have a small impact. It's the gradual creep of inefficiencies
that have added up over time to make the system less efficient than it
could be.
Command tested:
```
cmake -G Ninja -S llvm -B ${buildDir} \
-DLLVM_ENABLE_PROJECTS="mlir;clang" \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_LINKER=lld \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DLLVM_TOOLCHAIN_CHECK_CACHE=${PWD}/.toolchain-check-cache.cmake \
--fresh
```
## Optimizations:
### Optimize `check_linker_flag` calls
In `AddLLVM.cmake`, there were a couple places where we call `check_linker_flag`
every time `llvm_add_library` is called. Even in non-initial cmake configuration
runs, this carries unreasonable overhead.
Change: Host (CheckLinkerFlag)` in AddLLVM.cmake and optimize placement of `check_linker_flag` calls
so that they are only made once.
Impact: - <1 sec
### Make `add_lit_testsuites` optional
The function `add_lit_testsuites` is used to
recursively populate a set of convenience targets that run a
filtered portion of a LIT test suite. So instead of running `check-mlir`
you can run `check-mlir-dialect`. These targets are built recursively
for each subdirectory (e.g. `check-mlir-dialect-tensor`, `check-mlir-dialect-math`, etc.).
This call has quite a bit of overhead, especially for the main LLVM LIT test suite.
Personally I use a combination of `ninja -C build check-mlir-build-only` and
`llvm-lit` directly to run filtered portions of the MLIR LIT test suite, but
I can imagine that others depend on these filtered targets.
Change: Introduce a new option `LLVM_ENABLE_LIT_CONVENIENCE_TARGETS`
which defaults to `ON`. When set to `OFF`, the function `add_lit_testsuites`
just becomes a no-op. It's possible that we could also just improve the performance
of `add_lit_testsuites` directly, but I didn't pursue this.
Impact: ~1-2sec
### Reduce `file(GLOB)` calls in `LLVMProcessSources.cmake`
The `llvm_process_sources` call is made whenver the `llvm_add_library`
function is called. It makes several `file(GLOB)` calls, which can
be expensive depending on the underlying filesystem/storage. The
function globs for headers and TD files to add as sources to the target,
but the comments suggest that this is only necessary for MSVC. In addition,
it calls `llvm_check_source_file_list` to check that no source files in
the directory are unused unless `PARTIAL_SOURCES_INTENDED` is set, which
incurs another `file(GLOB)` call.
Changes: Guard the `file(GLOB)` calls for populating header sources
behind `if(MSVC)`. Only do the `llvm_check_source_file_list` check
if a new option `LLVM_ENABLE_EXPENSIVE_CMAKE_CHECKS` is set to `ON`.
Impact: depends on system. On my local workstation, impact is minimal.
On another remote server I use, impact is much larger.
### Optimize initial symbol/flag checks made in `config-ix.cmake` and `HandleLLVMOptions.cmake`
The `config-ix.cmake` and `HandleLLVMOptions.cmake` files make a number of calls to
compile C/C++ programs in order to verify the precense of certain symbols or
whether certain compiler flags are supported.
These checks have the biggest impact on an initial `cmake` configuration time.
I propose an "opt in" approach for amortizing these checks using a special generated
CMake cache file as directed by the developer.
An option `LLVM_TOOLCHAIN_CHECK_CACHE` is introduced. It should be set to
a path like `-DLLVM_TOOLCHAIN_CHECK_CACHE=$PWD/.toolchain-check-cache.cmake`.
Before entering the `config-ix.cmake` and `HandleLLVMOptions.cmake` files,
if the `LLVM_TOOLCHAIN_CHECK_CACHE` option is set and exists, include
that file to pre-populate cache variables.
Otherwise, we save the current set of CMake cache variables names.
After calling the `config-ix|HandleLLVMOptions` files,
if the `LLVM_TOOLCHAIN_CHECK_CACHE` option is set but does not exist,
check what new CMake cache variables were set by those scripts. Filter these variables by
whether they are likely cache variables
supporting symbol/flag checks (e.g. `CXX_SUPPORTS_.*|HAVE_.*` etc)
and write the file to set all these cache variables to their current values.
This allows a developer to obviate any subsequent checks, even in initial `cmake`
configuration runs. The correctness depends on the developer knowing
when it is invalid (e.g. they change toolchains or platforms) and us suddenly
not changing the meaning of `CXX_SUPPORTS_SOME_FLAG` to correspond to a different flag.
It could be extended the cache file to store a key used to check whether to regenerate
the cache, but I didn't go there.
Impact: Trivial overhead for cache generation, ~5sec reduction in initial config time.
### Reduce overhead of embedded Google Benchmark configuration
Note: technically this could be lumped in with the above if we expanded scope of before/after
change that the `LLVM_TOOLCHAIN_CHECK_CACHE` covers.
GoogleBenchmark is embedded under the `third-party/benchmark` directory.
Its CMake script does a compilation check for each flag that it wants to
populate (even for `-Wall`). In comparison, LLVM's HandleLLVMOptions.cmake script takes
a faster approach by skipping as many compilation checks as possible
if the cache variable `LLVM_COMPILER_IS_GCC_COMPATIBLE` is true.
Changes: Use `LLVM_COMPILER_IS_GCC_COMPATIBLE` to skip as many compilation
checks as possible in GoogleBenchmark.
Impact: ~1-2sec
0 commit comments