fix: Avoid frequent writes to `graph.json` in a loop #738

MichaelYochpaz · 2025-09-01T11:16:36Z

Before this change, we would write the graph.json over and over again for every single dependency that's added while the dependency graph is being created.

This PR changes this behavior to only write to the file once, only when all dependencies have been already added.

This PR resolves #735

tiran · 2025-09-01T11:43:48Z

src/fromager/commands/bootstrap.py

+    wkctx.write_to_graph_to_file()
+


Have you measured how large the performance improvement is going to be?

bootstrapping can fail for multiple reasons like missing sdist, conflict, or build error. We require to graph file to debug problems. Your change may speed up bootstrapping by a few seconds, maybe less.

Think about how you can write the file less often but still write it in case of an error.

I haven't tested the performance difference, although I'd like to. Need to figure out how, and once I do I'll post my findings here.

We're also calling DependencyGraph.serialize() for every time we write, which may or may not make an additional performance difference to the file writes.

As for the error case - good point, didn't know the graph file is used for debugging.

My thought is to wrap the section where we loop over the packages with a try block, and then put wkctx.write_to_graph_to_file() under a finally block, so that we export the dependency graph whether there's an error or not.

@tiran I've updated the code. It should save the graph.json file even if an error is encountered and it's incomplete.

Let me know what you think.

Before this change, we would write the `graph.json` over and over again for every single dependency that's added while the dependency graph is being created. This commit changes this behavior to only write to the file once, only when all dependencies have been already added. Signed-off-by: Michael Yochpaz <[email protected]>

Signed-off-by: Michael Yochpaz <[email protected]>

LalatenduMohanty · 2025-09-02T22:11:16Z

The code looks good to me but @tiran needs to review it as well.

dhellmann · 2025-09-02T22:47:30Z

This introduces some risk of losing the graph if the reason for the bootstrap failure is a full disk, but I think we can live with that if it improves performance. I'll leave it to @tiran to approve, since he has been thinking about these performance-related changes.

When you measured the time, how much difference did this make?

tiran · 2025-09-03T09:34:22Z

src/fromager/commands/bootstrap.py

+        try:
+            for req in to_build:
+                token = requirement_ctxvar.set(req)
+                bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
+                progressbar.update()
+                requirement_ctxvar.reset(token)
+
+        finally:
+            wkctx.write_to_graph_to_file()


How about you write the graph file every time Fromager processes a top level requirement? A typical product has between one and a handful of top level requirement.

for req in to_build: token = requirement_ctxvar.set(req) try: bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL) finally: wkctx.write_to_graph_to_file() progressbar.update() requirement_ctxvar.reset(token)

I think the code you made will still write the file for all dependencies, as to my understanding, we call bootstrap on every dependency recursively, so this loop runs for every dependency down the tree and not just the top level requirements. (this loop you quoted is part of the bootstrap function, and we call bootstrap on every requirement).

About the idea itself - it's possible to only write after each top level dependency is completed, and will likely still be a big performance boost (slightly lesser since we still do more writes, but it's marginal), but what benefit does this approach have over the approach of writing once at the end, inside a finally block (which means we'll write the file even if there's an error)? I don't see any benefits to it.

The graph file is critical for debugging build failures. It describes the entire dependency tree up to the point of the failure and helps us understand why packages are built in the order they are built. We MUST retain the ability to produce that file. Writing the file frequently is not overhead, it is a feature.

@dhellmann I get your point that it's an important file, but fail to understand how the proposed code lessens the ability to produce this file.

We still write the file whether it succeeds or there's an error, we just do it once instead of doing it repeatedly.

I don't really see how an edge case of a full disk is relevant... if the disk is full, you won't be able to write to the graph.json file with the existing code either, and Fromager will break regardless of the graph.json file, since it won't be able to download sdists nor build packages.

In the very rare case that there's only a few KB / MB available, you might get a partial graph file before it breaks using the current code, compared to no file at all with the proposed changes. But I don't see how this file would be useful as Fromager (any maybe even other parts of the system like Python) will break, and the user will want to rerun it properly anyways (where a full, proper graph.json file will be created) after resolving the space issue.

If you still strongly believe it's necessary to rewrite the file following every change let me know and I'll close the PR. I made this change mainly because I think it's a better practice and cleaner code, but I agree it's probably not a meaningful performance improvement.

MichaelYochpaz requested a review from a team as a code owner September 1, 2025 11:16

tiran reviewed Sep 1, 2025

View reviewed changes

MichaelYochpaz force-pushed the fix-frequent-graph-writes branch from 7c46f4e to ff39fb7 Compare September 2, 2025 09:07

Add .idea and .vscode directories to .gitignore

2670fb7

Signed-off-by: Michael Yochpaz <[email protected]>

MichaelYochpaz force-pushed the fix-frequent-graph-writes branch from ff39fb7 to 2670fb7 Compare September 2, 2025 09:09

tiran reviewed Sep 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: Avoid frequent writes to `graph.json` in a loop #738

fix: Avoid frequent writes to `graph.json` in a loop #738

MichaelYochpaz commented Sep 1, 2025

Uh oh!

tiran Sep 1, 2025

Uh oh!

MichaelYochpaz Sep 1, 2025 •

edited

Loading

Uh oh!

MichaelYochpaz Sep 2, 2025

Uh oh!

LalatenduMohanty commented Sep 2, 2025

Uh oh!

dhellmann commented Sep 2, 2025

Uh oh!

tiran Sep 3, 2025

Uh oh!

MichaelYochpaz Sep 3, 2025

Uh oh!

dhellmann Sep 5, 2025 •

edited

Loading

Uh oh!

MichaelYochpaz Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

fix: Avoid frequent writes to graph.json in a loop #738

Are you sure you want to change the base?

fix: Avoid frequent writes to graph.json in a loop #738

Conversation

MichaelYochpaz commented Sep 1, 2025

Uh oh!

tiran Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelYochpaz Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaelYochpaz Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Sep 2, 2025

Uh oh!

dhellmann commented Sep 2, 2025

Uh oh!

tiran Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelYochpaz Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

dhellmann Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaelYochpaz Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Avoid frequent writes to `graph.json` in a loop #738

fix: Avoid frequent writes to `graph.json` in a loop #738

MichaelYochpaz Sep 1, 2025 •

edited

Loading

dhellmann Sep 5, 2025 •

edited

Loading