Skip to content

Use estimated build time of subtree as scheduler edge weight heuristic #2682

@PowellNGL

Description

@PowellNGL

I've been testing this small change to the scheduler edge weight heuristic, and I've measured significant improvements in wall clock time in several real-world projects

The change is here: PowellNGL@67e093c

Currently, the scheduler only uses the depth of a subtree as the heuristic, this change instead uses the total estimated build time of the subtree as measured in previous runs. I've found several past discussions about using the estimated build time of the edge as a heuristic (e.g. #2177, #2019, #376) but I haven't seen anything about summing the entire subtree

Test Results (various open source C++ projects)

Each measurement is the average of 3 builds on an AMD Ryzen 7 7800X3D with -j 12. 'old' means the latest version of ninja from master, 'new' means ninja built from my fork

Repo Build Type Rebuild Type Avg time (old) Avg time (new) Change
boost Debug (ccache) clean 0.56s 0.44s -21.01%
boost Release (ccache) clean 0.50s 0.39s -20.79%
protobuf Release incremental 80.51s 71.73s -10.91%
ninja Debug (ccache) clean 0.45s 0.42s -5.22%
ninja Release clean 9.71s 9.21s -5.12%
boost Release incremental 9.28s 8.86s -4.46%
protobuf Release clean 140.13s 134.26s -4.18%
protobuf Debug incremental 64.24s 61.96s -3.56%
boost Debug incremental 8.10s 7.84s -3.17%
protobuf Debug clean 118.72s 115.64s -2.59%
boost Release clean 31.13s 30.34s -2.56%
boost Debug clean 26.38s 25.90s -1.81%
ninja Debug clean 5.01s 4.95s -1.33%
ninja Release (ccache) incremental 4.46s 4.43s -0.82%
protobuf Release (ccache) clean 1.97s 1.96s -0.51%
ninja Release incremental 5.27s 5.24s -0.51%
protobuf Debug (ccache) incremental 14.83s 14.81s -0.11%
boost Release (ccache) incremental 0.11s 0.11s -0.00%
ninja Debug (ccache) incremental 0.39s 0.39s -0.00%
ninja Release (ccache) clean 4.44s 4.45s 0.15%
ninja Debug incremental 1.63s 1.63s 0.20%
protobuf Release (ccache) incremental 1.62s 1.63s 0.41%
protobuf Debug (ccache) clean 15.07s 15.19s 0.80%
boost Debug (ccache) incremental 0.14s 0.14s 2.86%

Test Results (closed source repos I build regularly)

These were the original motivation for the change, because the critical path for both of these repos is defined by a small number of very slow linker invocations, so the most important thing for these builds is to start linking those binaries as early as possible

Each measurement is the average of 5 builds on an AMD EPYC 7532 with -j 100 (repo A) and -j 32 (repo B)

Repo Build Type Rebuild Type Avg time (old) Avg time (new) Change
repo A Debug clean 203.26s 166.65s -18.01%
repo A Release clean 384.26s 324.62s -15.52%
repo B Debug incremental 18.19s 15.40s -15.36%
repo B Debug clean 49.19s 41.89s -14.84%
repo A Debug incremental 86.77s 75.50s -12.99%
repo B Release incremental 22.16s 19.30s -12.90%
repo B Release clean 53.35s 46.47s -12.90%
repo A Release incremental 255.00s 240.63s -5.63%
repo B Debug (ccache) incremental 2.13s 2.06s -3.19%
repo B Release (ccache) incremental 8.03s 7.86s -2.02%
repo A Release (ccache) clean 211.79s 209.74s -0.97%
repo A Debug (ccache) clean 19.47s 19.35s -0.60%
repo B Release (ccache) clean 26.67s 26.62s -0.19%
repo A Release (ccache) incremental 154.03s 154.04s 0.01%
repo A Debug (ccache) incremental 4.35s 4.36s 0.15%
repo B Debug (ccache) clean 20.78s 20.82s 0.21%

More test details:

  • For clean builds, I ran ninja clean before each rebuild, and for incremental builds I ran touch <file> before each rebuild, where is a header file I chose for each project that several targets depend on
  • I ran every test with and without ccache enabled, to see how the build time changes when only the link time matters
  • For repo A and B I chose thread counts that reflected how we typically build them in practice

Potential issues with this approach

First, some projects might see minor build time regressions. Perhaps this heuristic should be optional?

This change would mean that the build order can change significantly between runs, especially when build steps are cached e.g. with ccache. This would expose bugs in build scripts that currently only work because the build order is stable. It might also make builds slower than before when a cache miss changes to a hit or vice versa and the heuristic is totally wrong, but I haven't seen that in my testing so far

When edges are added to the graph, they will always be scheduled last, because they default to a weighting of 1, corresponding to an estimated build time of 1ms. It might be better to schedule new edges earlier so that new bugs are discovered sooner. Perhaps the default could be a very high constant value instead, so they always run first?

Conclusion

That was a lot of words and numbers to essentially say: Would you be open to a pull request for this change?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions