Skip to content

Commit e04838e

Browse files
pks-tgitster
authored andcommitted
commit-graph: introduce envvar to disable commit existence checks
Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 43c8a30 commit e04838e

File tree

4 files changed

+42
-1
lines changed

4 files changed

+42
-1
lines changed

Documentation/git.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -911,6 +911,16 @@ for full details.
911911
should not normally need to set this to `0`, but it may be
912912
useful when trying to salvage data from a corrupted repository.
913913

914+
`GIT_COMMIT_GRAPH_PARANOIA`::
915+
When loading a commit object from the commit-graph, Git performs an
916+
existence check on the object in the object database. This is done to
917+
avoid issues with stale commit-graphs that contain references to
918+
already-deleted commits, but comes with a performance penalty.
919+
+
920+
The default is "true", which enables the aforementioned behavior.
921+
Setting this to "false" disables the existence check. This can lead to
922+
a performance improvement at the cost of consistency.
923+
914924
`GIT_ALLOW_PROTOCOL`::
915925
If set to a colon-separated list of protocols, behave as if
916926
`protocol.allow` is set to `never`, and each of the listed

commit-graph.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -907,14 +907,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
907907

908908
struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
909909
{
910+
static int commit_graph_paranoia = -1;
910911
struct commit *commit;
911912
uint32_t pos;
912913

914+
if (commit_graph_paranoia == -1)
915+
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
916+
913917
if (!prepare_commit_graph(repo))
914918
return NULL;
915919
if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
916920
return NULL;
917-
if (!has_object(repo, id, 0))
921+
if (commit_graph_paranoia && !has_object(repo, id, 0))
918922
return NULL;
919923

920924
commit = lookup_commit(repo, id);

commit-graph.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
#define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE"
99
#define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
1010

11+
/*
12+
* This environment variable controls whether commits looked up via the
13+
* commit graph will be double checked to exist in the object database.
14+
*/
15+
#define GIT_COMMIT_GRAPH_PARANOIA "GIT_COMMIT_GRAPH_PARANOIA"
16+
1117
/*
1218
* This method is only used to enhance coverage of the commit-graph
1319
* feature in the test suite with the GIT_TEST_COMMIT_GRAPH and

t/t5318-commit-graph.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -815,4 +815,25 @@ test_expect_success 'overflow during generation version upgrade' '
815815
)
816816
'
817817

818+
test_expect_success 'stale commit cannot be parsed when given directly' '
819+
test_when_finished "rm -rf repo" &&
820+
git init repo &&
821+
(
822+
cd repo &&
823+
test_commit A &&
824+
test_commit B &&
825+
git commit-graph write --reachable &&
826+
827+
oid=$(git rev-parse B) &&
828+
rm .git/objects/"$(test_oid_to_path "$oid")" &&
829+
830+
# Verify that it is possible to read the commit from the
831+
# commit graph when not being paranoid, ...
832+
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
833+
# ... but parsing the commit when double checking that
834+
# it actually exists in the object database should fail.
835+
test_must_fail git rev-list -1 B
836+
)
837+
'
838+
818839
test_done

0 commit comments

Comments
 (0)