Use estimated build time of subtree as scheduler edge weight heuristic

I've been testing this small change to the scheduler edge weight heuristic, and I've measured significant improvements in wall clock time in several real-world projects

The change is here: https://github.com/PowellNGL/ninja/commit/67e093c8afb9465772d6d1e764f28a930b536975

Currently, the scheduler only uses the depth of a subtree as the heuristic, this change instead uses the **total estimated build time of the subtree** as measured in previous runs. I've found several past discussions about using the estimated build time of the edge as a heuristic (e.g. https://github.com/ninja-build/ninja/pull/2177, https://github.com/ninja-build/ninja/pull/2019, https://github.com/ninja-build/ninja/issues/376) but I haven't seen anything about summing the entire subtree

### Test Results (various open source C++ projects)

Each measurement is the average of 3 builds on an AMD Ryzen 7 7800X3D with -j 12. 'old' means the latest version of ninja from master, 'new' means ninja built from my fork

|   Repo   |    Build Type    | Rebuild Type | Avg time (old) | Avg time (new) |  Change |
|----------|------------------|--------------|----------------|----------------|---------|
|  boost   |  Debug (ccache)  |    clean     |     0.56s      |     0.44s      | -21.01% |
|  boost   | Release (ccache) |    clean     |     0.50s      |     0.39s      | -20.79% |
| protobuf |     Release      | incremental  |     80.51s     |     71.73s     | -10.91% |
|  ninja   |  Debug (ccache)  |    clean     |     0.45s      |     0.42s      |  -5.22% |
|  ninja   |     Release      |    clean     |     9.71s      |     9.21s      |  -5.12% |
|  boost   |     Release      | incremental  |     9.28s      |     8.86s      |  -4.46% |
| protobuf |     Release      |    clean     |    140.13s     |    134.26s     |  -4.18% |
| protobuf |      Debug       | incremental  |     64.24s     |     61.96s     |  -3.56% |
|  boost   |      Debug       | incremental  |     8.10s      |     7.84s      |  -3.17% |
| protobuf |      Debug       |    clean     |    118.72s     |    115.64s     |  -2.59% |
|  boost   |     Release      |    clean     |     31.13s     |     30.34s     |  -2.56% |
|  boost   |      Debug       |    clean     |     26.38s     |     25.90s     |  -1.81% |
|  ninja   |      Debug       |    clean     |     5.01s      |     4.95s      |  -1.33% |
|  ninja   | Release (ccache) | incremental  |     4.46s      |     4.43s      |  -0.82% |
| protobuf | Release (ccache) |    clean     |     1.97s      |     1.96s      |  -0.51% |
|  ninja   |     Release      | incremental  |     5.27s      |     5.24s      |  -0.51% |
| protobuf |  Debug (ccache)  | incremental  |     14.83s     |     14.81s     |  -0.11% |
|  boost   | Release (ccache) | incremental  |     0.11s      |     0.11s      |  -0.00% |
|  ninja   |  Debug (ccache)  | incremental  |     0.39s      |     0.39s      |  -0.00% |
|  ninja   | Release (ccache) |    clean     |     4.44s      |     4.45s      |  0.15%  |
|  ninja   |      Debug       | incremental  |     1.63s      |     1.63s      |  0.20%  |
| protobuf | Release (ccache) | incremental  |     1.62s      |     1.63s      |  0.41%  |
| protobuf |  Debug (ccache)  |    clean     |     15.07s     |     15.19s     |  0.80%  |
|  boost   |  Debug (ccache)  | incremental  |     0.14s      |     0.14s      |  2.86%  |

### Test Results (closed source repos I build regularly)

These were the original motivation for the change, because the critical path for both of these repos is defined by a small number of very slow linker invocations, so the most important thing for these builds is to start linking those binaries as early as possible

Each measurement is the average of 5 builds on an AMD EPYC 7532 with -j 100 (repo A) and -j 32 (repo B)

|  Repo  |    Build Type    | Rebuild Type | Avg time (old) | Avg time (new) |  Change |
|--------|------------------|--------------|----------------|----------------|---------|
| repo A |      Debug       |    clean     |    203.26s     |    166.65s     | -18.01% |
| repo A |     Release      |    clean     |    384.26s     |    324.62s     | -15.52% |
| repo B |      Debug       | incremental  |     18.19s     |     15.40s     | -15.36% |
| repo B |      Debug       |    clean     |     49.19s     |     41.89s     | -14.84% |
| repo A |      Debug       | incremental  |     86.77s     |     75.50s     | -12.99% |
| repo B |     Release      | incremental  |     22.16s     |     19.30s     | -12.90% |
| repo B |     Release      |    clean     |     53.35s     |     46.47s     | -12.90% |
| repo A |     Release      | incremental  |    255.00s     |    240.63s     |  -5.63% |
| repo B |  Debug (ccache)  | incremental  |     2.13s      |     2.06s      |  -3.19% |
| repo B | Release (ccache) | incremental  |     8.03s      |     7.86s      |  -2.02% |
| repo A | Release (ccache) |    clean     |    211.79s     |    209.74s     |  -0.97% |
| repo A |  Debug (ccache)  |    clean     |     19.47s     |     19.35s     |  -0.60% |
| repo B | Release (ccache) |    clean     |     26.67s     |     26.62s     |  -0.19% |
| repo A | Release (ccache) | incremental  |    154.03s     |    154.04s     |  0.01%  |
| repo A |  Debug (ccache)  | incremental  |     4.35s      |     4.36s      |  0.15%  |
| repo B |  Debug (ccache)  |    clean     |     20.78s     |     20.82s     |  0.21%  |

### More test details:
* For clean builds, I ran `ninja clean` before each rebuild, and for incremental builds I ran `touch <file>` before each rebuild, where <file> is a header file I chose for each project that several targets depend on
* I ran every test with and without ccache enabled, to see how the build time changes when only the link time matters
* For repo A and B I chose thread counts that reflected how we typically build them in practice

### Potential issues with this approach
First, some projects might see minor build time regressions. Perhaps this heuristic should be optional?

This change would mean that the build order can change significantly between runs, especially when build steps are cached e.g. with ccache. This would expose bugs in build scripts that currently only work because the build order is stable. It might also make builds slower than before when a cache miss changes to a hit or vice versa and the heuristic is totally wrong, but I haven't seen that in my testing so far

When edges are added to the graph, they will always be scheduled last, because they default to a weighting of 1, corresponding to an estimated build time of 1ms. It might be better to schedule new edges earlier so that new bugs are discovered sooner. Perhaps the default could be a very high constant value instead, so they always run first?

### Conclusion
That was a lot of words and numbers to essentially say: Would you be open to a pull request for this change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use estimated build time of subtree as scheduler edge weight heuristic #2682

Test Results (various open source C++ projects)

Test Results (closed source repos I build regularly)

More test details:

Potential issues with this approach

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Repo	Build Type	Rebuild Type	Avg time (old)	Avg time (new)	Change
boost	Debug (ccache)	clean	0.56s	0.44s	-21.01%
boost	Release (ccache)	clean	0.50s	0.39s	-20.79%
protobuf	Release	incremental	80.51s	71.73s	-10.91%
ninja	Debug (ccache)	clean	0.45s	0.42s	-5.22%
ninja	Release	clean	9.71s	9.21s	-5.12%
boost	Release	incremental	9.28s	8.86s	-4.46%
protobuf	Release	clean	140.13s	134.26s	-4.18%
protobuf	Debug	incremental	64.24s	61.96s	-3.56%
boost	Debug	incremental	8.10s	7.84s	-3.17%
protobuf	Debug	clean	118.72s	115.64s	-2.59%
boost	Release	clean	31.13s	30.34s	-2.56%
boost	Debug	clean	26.38s	25.90s	-1.81%
ninja	Debug	clean	5.01s	4.95s	-1.33%
ninja	Release (ccache)	incremental	4.46s	4.43s	-0.82%
protobuf	Release (ccache)	clean	1.97s	1.96s	-0.51%
ninja	Release	incremental	5.27s	5.24s	-0.51%
protobuf	Debug (ccache)	incremental	14.83s	14.81s	-0.11%
boost	Release (ccache)	incremental	0.11s	0.11s	-0.00%
ninja	Debug (ccache)	incremental	0.39s	0.39s	-0.00%
ninja	Release (ccache)	clean	4.44s	4.45s	0.15%
ninja	Debug	incremental	1.63s	1.63s	0.20%
protobuf	Release (ccache)	incremental	1.62s	1.63s	0.41%
protobuf	Debug (ccache)	clean	15.07s	15.19s	0.80%
boost	Debug (ccache)	incremental	0.14s	0.14s	2.86%

Repo	Build Type	Rebuild Type	Avg time (old)	Avg time (new)	Change
repo A	Debug	clean	203.26s	166.65s	-18.01%
repo A	Release	clean	384.26s	324.62s	-15.52%
repo B	Debug	incremental	18.19s	15.40s	-15.36%
repo B	Debug	clean	49.19s	41.89s	-14.84%
repo A	Debug	incremental	86.77s	75.50s	-12.99%
repo B	Release	incremental	22.16s	19.30s	-12.90%
repo B	Release	clean	53.35s	46.47s	-12.90%
repo A	Release	incremental	255.00s	240.63s	-5.63%
repo B	Debug (ccache)	incremental	2.13s	2.06s	-3.19%
repo B	Release (ccache)	incremental	8.03s	7.86s	-2.02%
repo A	Release (ccache)	clean	211.79s	209.74s	-0.97%
repo A	Debug (ccache)	clean	19.47s	19.35s	-0.60%
repo B	Release (ccache)	clean	26.67s	26.62s	-0.19%
repo A	Release (ccache)	incremental	154.03s	154.04s	0.01%
repo A	Debug (ccache)	incremental	4.35s	4.36s	0.15%
repo B	Debug (ccache)	clean	20.78s	20.82s	0.21%

Use estimated build time of subtree as scheduler edge weight heuristic #2682

Description

Test Results (various open source C++ projects)

Test Results (closed source repos I build regularly)

More test details:

Potential issues with this approach

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions