Skip to content

Commit b1df3b3

Browse files
pks-tgitster
authored andcommitted
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <[email protected]> Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 564d025 commit b1df3b3

File tree

6 files changed

+16
-10
lines changed

6 files changed

+16
-10
lines changed

Documentation/git.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -917,9 +917,9 @@ for full details.
917917
avoid issues with stale commit-graphs that contain references to
918918
already-deleted commits, but comes with a performance penalty.
919919
+
920-
The default is "true", which enables the aforementioned behavior.
921-
Setting this to "false" disables the existence check. This can lead to
922-
a performance improvement at the cost of consistency.
920+
The default is "false", which disables the aforementioned behavior.
921+
Setting this to "true" enables the existence check so that stale commits
922+
will never be returned from the commit-graph at the cost of performance.
923923

924924
`GIT_ALLOW_PROTOCOL`::
925925
If set to a colon-separated list of protocols, behave as if

commit-graph.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1029,7 +1029,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
10291029
uint32_t pos;
10301030

10311031
if (commit_graph_paranoia == -1)
1032-
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
1032+
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 0);
10331033

10341034
if (!prepare_commit_graph(repo))
10351035
return NULL;

commit.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -577,7 +577,7 @@ int repo_parse_commit_internal(struct repository *r,
577577
static int commit_graph_paranoia = -1;
578578

579579
if (commit_graph_paranoia == -1)
580-
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
580+
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 0);
581581

582582
if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
583583
unparse_commit(r, &item->object.oid);

t/t5318-commit-graph.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -909,10 +909,10 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
909909
910910
# Verify that it is possible to read the commit from the
911911
# commit graph when not being paranoid, ...
912-
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
912+
git rev-list B &&
913913
# ... but parsing the commit when double checking that
914914
# it actually exists in the object database should fail.
915-
test_must_fail git rev-list -1 B
915+
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-list -1 B
916916
)
917917
'
918918

@@ -936,9 +936,9 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
936936
937937
# Again, we should be able to parse the commit when not
938938
# being paranoid about commit graph staleness...
939-
GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
939+
git rev-parse HEAD~2 &&
940940
# ... but fail when we are paranoid.
941-
test_must_fail git rev-parse HEAD~2 2>error &&
941+
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-parse HEAD~2 2>error &&
942942
grep "error: commit $oid exists in commit-graph but not in the object database" error
943943
)
944944
'

t/t6022-rev-list-missing.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ test_expect_success 'create repository and alternate directory' '
1313
test_commit 3
1414
'
1515

16+
# We manually corrupt the repository, which means that the commit-graph may
17+
# contain references to already-deleted objects. We thus need to enable
18+
# commit-graph paranoia to not returned these deleted commits from the graph.
19+
GIT_COMMIT_GRAPH_PARANOIA=true
20+
export GIT_COMMIT_GRAPH_PARANOIA
21+
1622
for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
1723
do
1824
test_expect_success "rev-list --missing=error fails with missing object $obj" '

t/t7700-repack.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ test_expect_success 'repacking fails when missing .pack actually means missing o
271271
ls .git/objects/pack/*.pack >before-pack-dir &&
272272
273273
test_must_fail git fsck &&
274-
test_must_fail git repack --cruft -d 2>err &&
274+
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git repack --cruft -d 2>err &&
275275
grep "bad object" err &&
276276
277277
# Before failing, the repack did not modify the

0 commit comments

Comments
 (0)