Skip to content

Conversation

@tiran
Copy link
Collaborator

@tiran tiran commented Sep 7, 2025

Our custom dependency tracker for build-parallel code has a bug. It does not track and handle installation requirements of build requirements correctly. This causes build bugs when the installation requirement tree of a build dependency is deep.

This PR completely rewrites and replaces the core logic of build-parallel command with stdlib's graphlib module. The TopologicalSorter detects cycles and tracks when a new node becomes ready. Nodes with a dependency on an exclusive build node now become built-ready earlier.

I also changed the code to use a single instance of ThreadPoolExecutor. It's a tiny bit more efficient and saves one level of indention.

See: #755

@tiran tiran requested a review from a team as a code owner September 7, 2025 06:14
@tiran tiran force-pushed the build-parallel-topo branch from dc4b529 to ee18657 Compare September 7, 2025 06:34
@mergify mergify bot added the ci label Sep 7, 2025
Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much easier to follow. I have one question about the difference in ordering inline.

@tiran tiran marked this pull request as draft September 15, 2025 12:21
@tiran
Copy link
Collaborator Author

tiran commented Sep 15, 2025

I'm going to rewrite this PR on top of #763 once the PR is approved.

continue
visited.add(edge.key)
yield edge.destination_node
for install_edge in self._traverse_install_requirements(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to include the build requirements of the edge.destination_node, too.

  • A needs B to build.
  • B needs C to build.
  • B needs D to install.

To build A you need B, C, and D.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the beauty of graphlib and my iter_build_requirements approach -- I can handle each package in the package set independently. There is no need to track build requirements of build requirements here.

  • iter_build_requirements() give me a set of packages that must be installed in the build environment of a package.
  • then I feed all packages and their build requirements into graphlib. TopologicalSorter.prepare freezes the graph and checks for cycles.
  • each iteration over the graph returns nodes that have all their build requirements satisfied. The first round are self-bootstrapping packages like setuptools and flit. The second round everything that only needs setuptools or flit, and so on.

In your example:

  • A.iter_build_requirements() returns B (direct build requirement) and D (via traverse_install_requirements(B.children)).
  • B.iter_build_requirements() returns C (direct build requirement)
  • C.iter_build_requirements() is empty
  • D.iter_build_requirements() is empty
>>> topo = graphlib.TopologicalSorter()
>>> topo.add("A", "B", "D")
>>> topo.add("B", "C")
>>> topo.add("C")
>>> topo.add("D")
>>> topo.prepare()
>>> while topo.is_active():
...     nodes = topo.get_ready()
...     print(nodes)
...     topo.done(*nodes)
...     
('D', 'C')
('B',)
('A',)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't understand how this code ensures that both build and installation requirements are included, though.

I see on line 141 that build requirements are skipped when iterating over installation requirements and I see on line 118 that installation requirements are skipped when iterating over build requirements. So as far as I can tell, only nodes where the edge has a type of build are included. That's the bug we have today, we aren't building everything we need in the right order.

This is the same issue we had in the bootstrap code where D needs to be treated as a build requirement of A. There we had the why chain to look at to realize that D was a build dependency, even though the edge type was install. Here we don't have that why chain.

Rather than being very specific with the algorithm, we can just build all of the dependencies of A, regardless of their type, before we build A. That will fix the issue AND give us simpler code, which is a big benefit in this logic since it's been generally buggy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added additional comments and test examples. The line numbers do not match the code any more.

I think you are misunderstanding the code of iter_build_requirements. The method has two loops and two yields. The outer loop is a shallow iteration over the direct build requirements of the package. The inner loop is recursive, depth first algorithm that yields the installation requirements for each build requirement.

  1. all direct build dependencies (packages in [build-system].requires)
  2. the children of packages in (1) that are installation requirements
  3. recursive dependencies of (2), depth first
  4. ...

iter_build_requirements() returns the entire build dependency set -- all packages that have to be installed in the build environment of a package.

The DependencyGraph.get_build_topology method feeds all nodes + all their entire build dependency set into the topological sorter. On every loop, the topological sorter returns all nodes that have no unsatisfied dependencies. In other words, the new algorithm immediately starts to build a package when its entire build dependency set is satisfied.

The algorithm may build a package although one of its installation requirements is not satisfied, yet. That's okay, because we don't have to install packages immediately. The algorithm ensures that all packages are processes, because all packages are in the graph.

Convert the `DependencyNode` and `DependencyEdge` into data classes. The
classes are frozen (immutable) and support comparison, sorting, and
hashing.

Signed-off-by: Christian Heimes <[email protected]>
Extend `DependencyNode` to get all install dependencies and build
requirements. The new method return unique dependencies by recursively
walking the dependency graph. The build requirements include all
recursive installation requirements of build requirements.

Signed-off-by: Christian Heimes <[email protected]>
The `DependencyGraph` can now construct a `graphlib.TopologicalSorter`
of dependencies and their build dependencies. The graph sorter returns
packages according to their build order.

Signed-off-by: Christian Heimes <[email protected]>
Our custom dependency tracker for build-parallel code has a bug. It does
not track and handle installation requirements of build requirements
correctly. This causes build bugs when the installation requirement tree
of a build dependency is deep.

This PR completely rewrites and replaces the core logic of
build-parallel command with stdlib's `graphlib` module. The
`TopologicalSorter` detects cycles and tracks when a new node becomes
ready. Nodes with a dependency on an exclusive build node now become
built-ready earlier.

I also changed the code to use a single instance of
`ThreadPoolExecutor`. It's a tiny bit more efficient and saves one level
of indention.

See: python-wheel-build#755
Signed-off-by: Christian Heimes <[email protected]>
@tiran tiran force-pushed the build-parallel-topo branch from 1a8583c to 59c500f Compare September 23, 2025 05:45
@tiran
Copy link
Collaborator Author

tiran commented Sep 29, 2025

Closing in favor of #796

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants