fix: use graphlib in build-parallel core #759

tiran · 2025-09-07T06:14:20Z

Our custom dependency tracker for build-parallel code has a bug. It does not track and handle installation requirements of build requirements correctly. This causes build bugs when the installation requirement tree of a build dependency is deep.

This PR completely rewrites and replaces the core logic of build-parallel command with stdlib's graphlib module. The TopologicalSorter detects cycles and tracks when a new node becomes ready. Nodes with a dependency on an exclusive build node now become built-ready earlier.

I also changed the code to use a single instance of ThreadPoolExecutor. It's a tiny bit more efficient and saves one level of indention.

See: #755

dhellmann

This looks much easier to follow. I have one question about the difference in ordering inline.

src/fromager/commands/build.py

tiran · 2025-09-15T12:23:25Z

I'm going to rewrite this PR on top of #763 once the PR is approved.

dhellmann · 2025-09-22T19:45:12Z

src/fromager/dependency_graph.py

+                continue
+            visited.add(edge.key)
+            yield edge.destination_node
+            for install_edge in self._traverse_install_requirements(


This needs to include the build requirements of the edge.destination_node, too.

A needs B to build.

B needs C to build.

B needs D to install.

To build A you need B, C, and D.

That's the beauty of graphlib and my iter_build_requirements approach -- I can handle each package in the package set independently. There is no need to track build requirements of build requirements here.

iter_build_requirements() give me a set of packages that must be installed in the build environment of a package.

then I feed all packages and their build requirements into graphlib. TopologicalSorter.prepare freezes the graph and checks for cycles.

each iteration over the graph returns nodes that have all their build requirements satisfied. The first round are self-bootstrapping packages like setuptools and flit. The second round everything that only needs setuptools or flit, and so on.

In your example:

A.iter_build_requirements() returns B (direct build requirement) and D (via traverse_install_requirements(B.children)).

B.iter_build_requirements() returns C (direct build requirement)

C.iter_build_requirements() is empty

D.iter_build_requirements() is empty

>>> topo = graphlib.TopologicalSorter() >>> topo.add("A", "B", "D") >>> topo.add("B", "C") >>> topo.add("C") >>> topo.add("D") >>> topo.prepare() >>> while topo.is_active(): ... nodes = topo.get_ready() ... print(nodes) ... topo.done(*nodes) ... ('D', 'C') ('B',) ('A',)

OK, I don't understand how this code ensures that both build and installation requirements are included, though.

I see on line 141 that build requirements are skipped when iterating over installation requirements and I see on line 118 that installation requirements are skipped when iterating over build requirements. So as far as I can tell, only nodes where the edge has a type of build are included. That's the bug we have today, we aren't building everything we need in the right order.

This is the same issue we had in the bootstrap code where D needs to be treated as a build requirement of A. There we had the why chain to look at to realize that D was a build dependency, even though the edge type was install. Here we don't have that why chain.

Rather than being very specific with the algorithm, we can just build all of the dependencies of A, regardless of their type, before we build A. That will fix the issue AND give us simpler code, which is a big benefit in this logic since it's been generally buggy.

I have added additional comments and test examples. The line numbers do not match the code any more.

I think you are misunderstanding the code of iter_build_requirements. The method has two loops and two yields. The outer loop is a shallow iteration over the direct build requirements of the package. The inner loop is recursive, depth first algorithm that yields the installation requirements for each build requirement.

all direct build dependencies (packages in [build-system].requires)

the children of packages in (1) that are installation requirements

recursive dependencies of (2), depth first

...

iter_build_requirements() returns the entire build dependency set -- all packages that have to be installed in the build environment of a package.

The DependencyGraph.get_build_topology method feeds all nodes + all their entire build dependency set into the topological sorter. On every loop, the topological sorter returns all nodes that have no unsatisfied dependencies. In other words, the new algorithm immediately starts to build a package when its entire build dependency set is satisfied.

The algorithm may build a package although one of its installation requirements is not satisfied, yet. That's okay, because we don't have to install packages immediately. The algorithm ensures that all packages are processes, because all packages are in the graph.

Convert the `DependencyNode` and `DependencyEdge` into data classes. The classes are frozen (immutable) and support comparison, sorting, and hashing. Signed-off-by: Christian Heimes <[email protected]>

Extend `DependencyNode` to get all install dependencies and build requirements. The new method return unique dependencies by recursively walking the dependency graph. The build requirements include all recursive installation requirements of build requirements. Signed-off-by: Christian Heimes <[email protected]>

The `DependencyGraph` can now construct a `graphlib.TopologicalSorter` of dependencies and their build dependencies. The graph sorter returns packages according to their build order. Signed-off-by: Christian Heimes <[email protected]>

Our custom dependency tracker for build-parallel code has a bug. It does not track and handle installation requirements of build requirements correctly. This causes build bugs when the installation requirement tree of a build dependency is deep. This PR completely rewrites and replaces the core logic of build-parallel command with stdlib's `graphlib` module. The `TopologicalSorter` detects cycles and tracks when a new node becomes ready. Nodes with a dependency on an exclusive build node now become built-ready earlier. I also changed the code to use a single instance of `ThreadPoolExecutor`. It's a tiny bit more efficient and saves one level of indention. See: python-wheel-build#755 Signed-off-by: Christian Heimes <[email protected]>

tiran · 2025-09-29T09:07:46Z

Closing in favor of #796

tiran requested a review from a team as a code owner September 7, 2025 06:14

tiran force-pushed the build-parallel-topo branch from dc4b529 to ee18657 Compare September 7, 2025 06:34

mergify bot added the ci label Sep 7, 2025

dhellmann reviewed Sep 7, 2025

View reviewed changes

src/fromager/commands/build.py Outdated Show resolved Hide resolved

tiran marked this pull request as draft September 15, 2025 12:21

tiran force-pushed the build-parallel-topo branch from ee18657 to 1a8583c Compare September 15, 2025 14:26

dhellmann mentioned this pull request Sep 20, 2025

fix: build installation dependencies before building consumers #771

Merged

dhellmann reviewed Sep 22, 2025

View reviewed changes

tiran added 4 commits September 23, 2025 07:39

chore: use dataclass for DependencyNode

6ad5865

Convert the `DependencyNode` and `DependencyEdge` into data classes. The classes are frozen (immutable) and support comparison, sorting, and hashing. Signed-off-by: Christian Heimes <[email protected]>

feat: add TopologicalSorter to DependencyGraph

e932260

The `DependencyGraph` can now construct a `graphlib.TopologicalSorter` of dependencies and their build dependencies. The graph sorter returns packages according to their build order. Signed-off-by: Christian Heimes <[email protected]>

tiran force-pushed the build-parallel-topo branch from 1a8583c to 59c500f Compare September 23, 2025 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: use graphlib in build-parallel core #759

fix: use graphlib in build-parallel core #759

Uh oh!

tiran commented Sep 7, 2025

Uh oh!

dhellmann left a comment

Uh oh!

Uh oh!

tiran commented Sep 15, 2025

Uh oh!

dhellmann Sep 22, 2025

Uh oh!

tiran Sep 22, 2025

Uh oh!

dhellmann Sep 22, 2025

Uh oh!

tiran Sep 23, 2025

Uh oh!

tiran commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: use graphlib in build-parallel core #759

Are you sure you want to change the base?

fix: use graphlib in build-parallel core #759

Uh oh!

Conversation

tiran commented Sep 7, 2025

Uh oh!

dhellmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tiran commented Sep 15, 2025

Uh oh!

dhellmann Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

tiran Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dhellmann Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

tiran Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

tiran commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants