Skip to content

Conversation

emilyyang-ms
Copy link

No description provided.

Changed-path bloom filters has been a stable feature for a few years and
it significantly improves performance of file history computation for
large repos. Currently it can be turned on by 'git commit-graph write
--changed-paths'. As one of the large repos, Microsoft Office MonoRepo
would like to get this feature on for repo developers with minimal
effort.

In this commit, we're proposing a new config option
'commitGraph.changedPaths', which acts like '--changed-paths' and always
respects command line precedence. We then add this new config as
recommended in scalar to benefit large repos.

This commit also adds corresponding doc and unit tests for the new
config.

Signed-off-by: Emily Yang <[email protected]>
Mentored-by: Derrick Stolee <[email protected]>
Copy link

gitgitgadget bot commented Oct 7, 2025

Welcome to GitGitGadget

Hi @emilyyang-ms, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <[email protected]>, Ill Takalook <[email protected]>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@gitgitgadget gitgitgadget bot added the new user label Oct 7, 2025
@dscho
Copy link
Member

dscho commented Oct 7, 2025

/allow

Copy link

gitgitgadget bot commented Oct 7, 2025

User emilyyang-ms is now allowed to use GitGitGadget.

WARNING: emilyyang-ms has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

# --no-changed-paths should take precedence over the config and erase bloom filters
GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
! grep -i "bloom filters" error &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there is some white-space on this otherwise empty line:

--- 61c1ad542c Commitgraph: add new config for changed-paths and recommend it in scalar
t/t5318-commit-graph.sh:965: trailing whitespace.
+  

Would you mind deleting that, amending the commit, and force-pushing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or: git rebase --whitespace=fix HEAD~1 should work for this single patch.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments in the file diff, and here is a copy of your commit message:

Commitgraph: add new config for changed-paths and recommend it in scalar

Changed-path bloom filters has been a stable feature for a few years and
it significantly improves performance of file history computation for
large repos. Currently it can be turned on by 'git commit-graph write
--changed-paths'. As one of the large repos, Microsoft Office MonoRepo
would like to get this feature on for repo developers with minimal
effort.

In this commit, we're proposing a new config option
'commitGraph.changedPaths', which acts like '--changed-paths' and always
respects command line precedence. We then add this new config as
recommended in scalar to benefit large repos.

This commit also adds corresponding doc and unit tests for the new
config.

Signed-off-by: Emily Yang <[email protected]>
Mentored-by: Derrick Stolee <[email protected]>

A few things about this:

  • Start your subject line with commit-graph: add new... to better match the commit-graph builtin.
  • Your sign-off should be the very last line, signaling "I attest to everything before this line".
  • Helped-by is more common than Mentored-by, but this is flexible and your call.
  • ...filters has been is a little awkward. Maybe "The changed-path Bloom filters feature has been stable for a few years..."
  • "As one of the large repos, Microsoft Office MonoRepo would like to get this feature on for repo developers with minimal effort." This won't be an effective appeal. Instead, a more generic "Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation."
  • Your config has a true/false/unset logic and it's similar to how the --changed-paths option has a --changed-paths/--no-changed-paths/not-present tri-state. It may be good to discuss how these work together (false config disables a previous true config but doesn't imply --no-changed-paths). You can reference the change 0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01) (exactly that format, as git log -1 --pretty=reference <oid> would output) for how this tri-state situation happened. It's good context for how the filters persist once they exist from a foreground write with --changed-paths.
  • "This commit also adds corresponding doc and unit tests for the new
    config." This is a requirement of such a change, so doesn't need to be included in the message.

After another round of edits, we'll talk about getting your cover letter into shape as it will be part of your introduction to the mailing list and we'll want to CC the right folks.

if (!strcmp(var, "commitgraph.maxnewfilters"))
write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
else if (!strcmp(var, "commitgraph.changedpaths")) {
/* Only set to 1 if config is true and not already overridden by command line */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this config is loaded before the command-line arguments, we don't want to do this check for opts.enable_changed_paths == -1. As we parse the config, we may have duplicate key-value pairs in system, global, or local config and we need to use a "last one wins" approach.

We should consider the ability for global config to have it set to true and the local config set to false. Right now, the false value is ignored. Would opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1; give us the right behavior (false disables the previous true but doesn't actually erase changed-path filters that may otherwise exist).

# --no-changed-paths should take precedence over the config and erase bloom filters
GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
! grep -i "bloom filters" error &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or: git rebase --whitespace=fix HEAD~1 should work for this single patch.

# set commitGraph.changedPaths to false and it should not write bloom filters
git config commitGraph.changedPaths false &&
GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
! grep -i "bloom filters" error &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test_grep helper that will help with this, I think:

test_grep ! "bloom filters" error &&

or drop the ! when you expect the match.

cd config-changed-paths-true &&
test_commit initial &&
test_commit second &&
test_commit third &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a case that we should add:

Global config has true, local config has false. (Alternatively: use git config --add to have multiple values in the same config file, true then false.)

These tests seem complete and correct, and now I'm providing feedback that is more a matter of "taste" and a preference for succinct tests (especially around config options).

Test setup ideas: we only need a single "new" commit to trigger new changed-path filter computations in git commit-graph write. We can use one test to toggle config values (and the state of the commit-graph having filters or not).

I'm thinking through a sequence of events that will help (consider a test_commit before each case):

  1. Write commit-graph without --changed-paths. See no progress for them.
  2. Config to true. Write without --changed-paths. See progress.
  3. Add config as false on top of true. Write without --changed-paths. See progress (due to existing filters).
  4. Keep config as in previous test, but delete commit-graph and write without --changed-paths. Don't see progress.
  5. Clear config and set to false. Write with --changed-paths and see progress (due to CLI).
  6. Keep false config, write without --changed-paths and see progress (due to existing filters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants