-
Notifications
You must be signed in to change notification settings - Fork 157
Commitgraph: add new config for changed-paths and recommend it in scalar #1983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Commitgraph: add new config for changed-paths and recommend it in scalar #1983
Conversation
Changed-path bloom filters has been a stable feature for a few years and it significantly improves performance of file history computation for large repos. Currently it can be turned on by 'git commit-graph write --changed-paths'. As one of the large repos, Microsoft Office MonoRepo would like to get this feature on for repo developers with minimal effort. In this commit, we're proposing a new config option 'commitGraph.changedPaths', which acts like '--changed-paths' and always respects command line precedence. We then add this new config as recommended in scalar to benefit large repos. This commit also adds corresponding doc and unit tests for the new config. Signed-off-by: Emily Yang <[email protected]> Mentored-by: Derrick Stolee <[email protected]>
Welcome to GitGitGadgetHi @emilyyang-ms, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax:
NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
/allow |
User emilyyang-ms is now allowed to use GitGitGadget. WARNING: emilyyang-ms has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use. |
# --no-changed-paths should take precedence over the config and erase bloom filters | ||
GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error && | ||
! grep -i "bloom filters" error && | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently there is some white-space on this otherwise empty line:
--- 61c1ad542c Commitgraph: add new config for changed-paths and recommend it in scalar
t/t5318-commit-graph.sh:965: trailing whitespace.
+
Would you mind deleting that, amending the commit, and force-pushing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or: git rebase --whitespace=fix HEAD~1
should work for this single patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments in the file diff, and here is a copy of your commit message:
Commitgraph: add new config for changed-paths and recommend it in scalar
Changed-path bloom filters has been a stable feature for a few years and
it significantly improves performance of file history computation for
large repos. Currently it can be turned on by 'git commit-graph write
--changed-paths'. As one of the large repos, Microsoft Office MonoRepo
would like to get this feature on for repo developers with minimal
effort.
In this commit, we're proposing a new config option
'commitGraph.changedPaths', which acts like '--changed-paths' and always
respects command line precedence. We then add this new config as
recommended in scalar to benefit large repos.
This commit also adds corresponding doc and unit tests for the new
config.
Signed-off-by: Emily Yang <[email protected]>
Mentored-by: Derrick Stolee <[email protected]>
A few things about this:
- Start your subject line with
commit-graph: add new...
to better match the commit-graph builtin. - Your sign-off should be the very last line, signaling "I attest to everything before this line".
- Helped-by is more common than Mentored-by, but this is flexible and your call.
...filters has been
is a little awkward. Maybe "The changed-path Bloom filters feature has been stable for a few years..."- "As one of the large repos, Microsoft Office MonoRepo would like to get this feature on for repo developers with minimal effort." This won't be an effective appeal. Instead, a more generic "Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation."
- Your config has a true/false/unset logic and it's similar to how the
--changed-paths
option has a--changed-paths
/--no-changed-paths
/not-present tri-state. It may be good to discuss how these work together (false
config disables a previoustrue
config but doesn't imply--no-changed-paths
). You can reference the change0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01)
(exactly that format, asgit log -1 --pretty=reference <oid>
would output) for how this tri-state situation happened. It's good context for how the filters persist once they exist from a foreground write with--changed-paths
. - "This commit also adds corresponding doc and unit tests for the new
config." This is a requirement of such a change, so doesn't need to be included in the message.
After another round of edits, we'll talk about getting your cover letter into shape as it will be part of your introduction to the mailing list and we'll want to CC the right folks.
if (!strcmp(var, "commitgraph.maxnewfilters")) | ||
write_opts.max_new_filters = git_config_int(var, value, ctx->kvi); | ||
else if (!strcmp(var, "commitgraph.changedpaths")) { | ||
/* Only set to 1 if config is true and not already overridden by command line */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this config is loaded before the command-line arguments, we don't want to do this check for opts.enable_changed_paths == -1
. As we parse the config, we may have duplicate key-value pairs in system, global, or local config and we need to use a "last one wins" approach.
We should consider the ability for global config to have it set to true
and the local config set to false
. Right now, the false
value is ignored. Would opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
give us the right behavior (false
disables the previous true
but doesn't actually erase changed-path filters that may otherwise exist).
# --no-changed-paths should take precedence over the config and erase bloom filters | ||
GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error && | ||
! grep -i "bloom filters" error && | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or: git rebase --whitespace=fix HEAD~1
should work for this single patch.
# set commitGraph.changedPaths to false and it should not write bloom filters | ||
git config commitGraph.changedPaths false && | ||
GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && | ||
! grep -i "bloom filters" error && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test_grep
helper that will help with this, I think:
test_grep ! "bloom filters" error &&
or drop the !
when you expect the match.
cd config-changed-paths-true && | ||
test_commit initial && | ||
test_commit second && | ||
test_commit third && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a case that we should add:
Global config has
true
, local config hasfalse
. (Alternatively: usegit config --add
to have multiple values in the same config file,true
thenfalse
.)
These tests seem complete and correct, and now I'm providing feedback that is more a matter of "taste" and a preference for succinct tests (especially around config options).
Test setup ideas: we only need a single "new" commit to trigger new changed-path filter computations in git commit-graph write
. We can use one test to toggle config values (and the state of the commit-graph having filters or not).
I'm thinking through a sequence of events that will help (consider a test_commit
before each case):
- Write commit-graph without
--changed-paths
. See no progress for them. - Config to
true
. Write without--changed-paths
. See progress. - Add config as
false
on top oftrue
. Write without--changed-paths
. See progress (due to existing filters). - Keep config as in previous test, but delete commit-graph and write without
--changed-paths
. Don't see progress. - Clear config and set to
false
. Write with--changed-paths
and see progress (due to CLI). - Keep
false
config, write without--changed-paths
and see progress (due to existing filters).
No description provided.