-
Couldn't load subscription status.
- Fork 31
fix: Avoid frequent writes to graph.json in a loop
#738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MichaelYochpaz
wants to merge
2
commits into
python-wheel-build:main
Choose a base branch
from
MichaelYochpaz:fix-frequent-graph-writes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,3 +34,6 @@ __pycache__/ | |
|
|
||
| /.mypy_cache/ | ||
| .skip-coverage | ||
|
|
||
| .vscode/ | ||
| .idea/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about you write the graph file every time Fromager processes a top level requirement? A typical product has between one and a handful of top level requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code you made will still write the file for all dependencies, as to my understanding, we call
bootstrapon every dependency recursively, so this loop runs for every dependency down the tree and not just the top level requirements. (this loop you quoted is part of thebootstrapfunction, and we callbootstrapon every requirement).About the idea itself - it's possible to only write after each top level dependency is completed, and will likely still be a big performance boost (slightly lesser since we still do more writes, but it's marginal), but what benefit does this approach have over the approach of writing once at the end, inside a
finallyblock (which means we'll write the file even if there's an error)? I don't see any benefits to it.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graph file is critical for debugging build failures. It describes the entire dependency tree up to the point of the failure and helps us understand why packages are built in the order they are built. We MUST retain the ability to produce that file. Writing the file frequently is not overhead, it is a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhellmann I get your point that it's an important file, but fail to understand how the proposed code lessens the ability to produce this file.
We still write the file whether it succeeds or there's an error, we just do it once instead of doing it repeatedly.
I don't really see how an edge case of a full disk is relevant... if the disk is full, you won't be able to write to the
graph.jsonfile with the existing code either, and Fromager will break regardless of thegraph.jsonfile, since it won't be able to download sdists nor build packages.In the very rare case that there's only a few KB / MB available, you might get a partial graph file before it breaks using the current code, compared to no file at all with the proposed changes. But I don't see how this file would be useful as Fromager (any maybe even other parts of the system like Python) will break, and the user will want to rerun it properly anyways (where a full, proper
graph.jsonfile will be created) after resolving the space issue.If you still strongly believe it's necessary to rewrite the file following every change let me know and I'll close the PR. I made this change mainly because I think it's a better practice and cleaner code, but I agree it's probably not a meaningful performance improvement.