Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,6 @@ __pycache__/

/.mypy_cache/
.skip-coverage

.vscode/
.idea/
1 change: 0 additions & 1 deletion src/fromager/bootstrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -951,7 +951,6 @@ def _add_to_graph(
download_url=download_url,
pre_built=pbi.pre_built,
)
self.ctx.write_to_graph_to_file()

def _sort_requirements(
self,
Expand Down
14 changes: 9 additions & 5 deletions src/fromager/commands/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,15 @@ def bootstrap(
)
requirement_ctxvar.reset(token)

for req in to_build:
token = requirement_ctxvar.set(req)
bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
progressbar.update()
requirement_ctxvar.reset(token)
try:
for req in to_build:
token = requirement_ctxvar.set(req)
bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
progressbar.update()
requirement_ctxvar.reset(token)

finally:
wkctx.write_to_graph_to_file()
Comment on lines +182 to +190
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you write the graph file every time Fromager processes a top level requirement? A typical product has between one and a handful of top level requirement.

        for req in to_build:
            token = requirement_ctxvar.set(req)
            try:
                bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
            finally:
                wkctx.write_to_graph_to_file()
            progressbar.update()
            requirement_ctxvar.reset(token)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code you made will still write the file for all dependencies, as to my understanding, we call bootstrap on every dependency recursively, so this loop runs for every dependency down the tree and not just the top level requirements. (this loop you quoted is part of the bootstrap function, and we call bootstrap on every requirement).

About the idea itself - it's possible to only write after each top level dependency is completed, and will likely still be a big performance boost (slightly lesser since we still do more writes, but it's marginal), but what benefit does this approach have over the approach of writing once at the end, inside a finally block (which means we'll write the file even if there's an error)? I don't see any benefits to it.

Copy link
Member

@dhellmann dhellmann Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graph file is critical for debugging build failures. It describes the entire dependency tree up to the point of the failure and helps us understand why packages are built in the order they are built. We MUST retain the ability to produce that file. Writing the file frequently is not overhead, it is a feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhellmann I get your point that it's an important file, but fail to understand how the proposed code lessens the ability to produce this file.

We still write the file whether it succeeds or there's an error, we just do it once instead of doing it repeatedly.

I don't really see how an edge case of a full disk is relevant... if the disk is full, you won't be able to write to the graph.json file with the existing code either, and Fromager will break regardless of the graph.json file, since it won't be able to download sdists nor build packages.

In the very rare case that there's only a few KB / MB available, you might get a partial graph file before it breaks using the current code, compared to no file at all with the proposed changes. But I don't see how this file would be useful as Fromager (any maybe even other parts of the system like Python) will break, and the user will want to rerun it properly anyways (where a full, proper graph.json file will be created) after resolving the space issue.

If you still strongly believe it's necessary to rewrite the file following every change let me know and I'll close the PR. I made this change mainly because I think it's a better practice and cleaner code, but I agree it's probably not a meaningful performance improvement.


constraints_filename = wkctx.work_dir / "constraints.txt"
if skip_constraints:
Expand Down
Loading