Skip to content

Conversation

MichaelYochpaz
Copy link

Before this change, we would write the graph.json over and over again for every single dependency that's added while the dependency graph is being created.

This PR changes this behavior to only write to the file once, only when all dependencies have been already added.

This PR resolves #735

@MichaelYochpaz MichaelYochpaz requested a review from a team as a code owner September 1, 2025 11:16
Comment on lines 188 to 191
wkctx.write_to_graph_to_file()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you measured how large the performance improvement is going to be?

bootstrapping can fail for multiple reasons like missing sdist, conflict, or build error. We require to graph file to debug problems. Your change may speed up bootstrapping by a few seconds, maybe less.

Think about how you can write the file less often but still write it in case of an error.

Copy link
Author

@MichaelYochpaz MichaelYochpaz Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested the performance difference, although I'd like to. Need to figure out how, and once I do I'll post my findings here.

We're also calling DependencyGraph.serialize() for every time we write, which may or may not make an additional performance difference to the file writes.

As for the error case - good point, didn't know the graph file is used for debugging.

My thought is to wrap the section where we loop over the packages with a try block, and then put wkctx.write_to_graph_to_file() under a finally block, so that we export the dependency graph whether there's an error or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran I've updated the code. It should save the graph.json file even if an error is encountered and it's incomplete.

Let me know what you think.

Before this change, we would write the `graph.json` over and over again for every single dependency that's added while the dependency graph is being created.
This commit changes this behavior to only write to the file once, only when all dependencies have been already added.

Signed-off-by: Michael Yochpaz <[email protected]>
@MichaelYochpaz MichaelYochpaz force-pushed the fix-frequent-graph-writes branch from 7c46f4e to ff39fb7 Compare September 2, 2025 09:07
@MichaelYochpaz MichaelYochpaz force-pushed the fix-frequent-graph-writes branch from ff39fb7 to 2670fb7 Compare September 2, 2025 09:09
@LalatenduMohanty
Copy link
Member

The code looks good to me but @tiran needs to review it as well.

@dhellmann
Copy link
Member

This introduces some risk of losing the graph if the reason for the bootstrap failure is a full disk, but I think we can live with that if it improves performance. I'll leave it to @tiran to approve, since he has been thinking about these performance-related changes.

When you measured the time, how much difference did this make?

Comment on lines +182 to +190
try:
for req in to_build:
token = requirement_ctxvar.set(req)
bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
progressbar.update()
requirement_ctxvar.reset(token)

finally:
wkctx.write_to_graph_to_file()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you write the graph file every time Fromager processes a top level requirement? A typical product has between one and a handful of top level requirement.

        for req in to_build:
            token = requirement_ctxvar.set(req)
            try:
                bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
            finally:
                wkctx.write_to_graph_to_file()
            progressbar.update()
            requirement_ctxvar.reset(token)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code you made will still write the file for all dependencies, as to my understanding, we call bootstrap on every dependency recursively, so this loop runs for every dependency down the tree and not just the top level requirements. (this loop you quoted is part of the bootstrap function, and we call bootstrap on every requirement).

About the idea itself - it's possible to only write after each top level dependency is completed, and will likely still be a big performance boost (slightly lesser since we still do more writes, but it's marginal), but what benefit does this approach have over the approach of writing once at the end, inside a finally block (which means we'll write the file even if there's an error)? I don't see any benefits to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid frequent writes to graph.json during bootstrap
4 participants