-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
I've been testing this small change to the scheduler edge weight heuristic, and I've measured significant improvements in wall clock time in several real-world projects
The change is here: PowellNGL@67e093c
Currently, the scheduler only uses the depth of a subtree as the heuristic, this change instead uses the total estimated build time of the subtree as measured in previous runs. I've found several past discussions about using the estimated build time of the edge as a heuristic (e.g. #2177, #2019, #376) but I haven't seen anything about summing the entire subtree
Test Results (various open source C++ projects)
Each measurement is the average of 3 builds on an AMD Ryzen 7 7800X3D with -j 12. 'old' means the latest version of ninja from master, 'new' means ninja built from my fork
| Repo | Build Type | Rebuild Type | Avg time (old) | Avg time (new) | Change |
|---|---|---|---|---|---|
| boost | Debug (ccache) | clean | 0.56s | 0.44s | -21.01% |
| boost | Release (ccache) | clean | 0.50s | 0.39s | -20.79% |
| protobuf | Release | incremental | 80.51s | 71.73s | -10.91% |
| ninja | Debug (ccache) | clean | 0.45s | 0.42s | -5.22% |
| ninja | Release | clean | 9.71s | 9.21s | -5.12% |
| boost | Release | incremental | 9.28s | 8.86s | -4.46% |
| protobuf | Release | clean | 140.13s | 134.26s | -4.18% |
| protobuf | Debug | incremental | 64.24s | 61.96s | -3.56% |
| boost | Debug | incremental | 8.10s | 7.84s | -3.17% |
| protobuf | Debug | clean | 118.72s | 115.64s | -2.59% |
| boost | Release | clean | 31.13s | 30.34s | -2.56% |
| boost | Debug | clean | 26.38s | 25.90s | -1.81% |
| ninja | Debug | clean | 5.01s | 4.95s | -1.33% |
| ninja | Release (ccache) | incremental | 4.46s | 4.43s | -0.82% |
| protobuf | Release (ccache) | clean | 1.97s | 1.96s | -0.51% |
| ninja | Release | incremental | 5.27s | 5.24s | -0.51% |
| protobuf | Debug (ccache) | incremental | 14.83s | 14.81s | -0.11% |
| boost | Release (ccache) | incremental | 0.11s | 0.11s | -0.00% |
| ninja | Debug (ccache) | incremental | 0.39s | 0.39s | -0.00% |
| ninja | Release (ccache) | clean | 4.44s | 4.45s | 0.15% |
| ninja | Debug | incremental | 1.63s | 1.63s | 0.20% |
| protobuf | Release (ccache) | incremental | 1.62s | 1.63s | 0.41% |
| protobuf | Debug (ccache) | clean | 15.07s | 15.19s | 0.80% |
| boost | Debug (ccache) | incremental | 0.14s | 0.14s | 2.86% |
Test Results (closed source repos I build regularly)
These were the original motivation for the change, because the critical path for both of these repos is defined by a small number of very slow linker invocations, so the most important thing for these builds is to start linking those binaries as early as possible
Each measurement is the average of 5 builds on an AMD EPYC 7532 with -j 100 (repo A) and -j 32 (repo B)
| Repo | Build Type | Rebuild Type | Avg time (old) | Avg time (new) | Change |
|---|---|---|---|---|---|
| repo A | Debug | clean | 203.26s | 166.65s | -18.01% |
| repo A | Release | clean | 384.26s | 324.62s | -15.52% |
| repo B | Debug | incremental | 18.19s | 15.40s | -15.36% |
| repo B | Debug | clean | 49.19s | 41.89s | -14.84% |
| repo A | Debug | incremental | 86.77s | 75.50s | -12.99% |
| repo B | Release | incremental | 22.16s | 19.30s | -12.90% |
| repo B | Release | clean | 53.35s | 46.47s | -12.90% |
| repo A | Release | incremental | 255.00s | 240.63s | -5.63% |
| repo B | Debug (ccache) | incremental | 2.13s | 2.06s | -3.19% |
| repo B | Release (ccache) | incremental | 8.03s | 7.86s | -2.02% |
| repo A | Release (ccache) | clean | 211.79s | 209.74s | -0.97% |
| repo A | Debug (ccache) | clean | 19.47s | 19.35s | -0.60% |
| repo B | Release (ccache) | clean | 26.67s | 26.62s | -0.19% |
| repo A | Release (ccache) | incremental | 154.03s | 154.04s | 0.01% |
| repo A | Debug (ccache) | incremental | 4.35s | 4.36s | 0.15% |
| repo B | Debug (ccache) | clean | 20.78s | 20.82s | 0.21% |
More test details:
- For clean builds, I ran
ninja cleanbefore each rebuild, and for incremental builds I rantouch <file>before each rebuild, where is a header file I chose for each project that several targets depend on - I ran every test with and without ccache enabled, to see how the build time changes when only the link time matters
- For repo A and B I chose thread counts that reflected how we typically build them in practice
Potential issues with this approach
First, some projects might see minor build time regressions. Perhaps this heuristic should be optional?
This change would mean that the build order can change significantly between runs, especially when build steps are cached e.g. with ccache. This would expose bugs in build scripts that currently only work because the build order is stable. It might also make builds slower than before when a cache miss changes to a hit or vice versa and the heuristic is totally wrong, but I haven't seen that in my testing so far
When edges are added to the graph, they will always be scheduled last, because they default to a weighting of 1, corresponding to an estimated build time of 1ms. It might be better to schedule new edges earlier so that new bugs are discovered sooner. Perhaps the default could be a very high constant value instead, so they always run first?
Conclusion
That was a lot of words and numbers to essentially say: Would you be open to a pull request for this change?