Skip to content

Commit 7a5d604

Browse files
pks-tgitster
authored andcommitted
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent e04838e commit 7a5d604

File tree

2 files changed

+42
-1
lines changed

2 files changed

+42
-1
lines changed

commit.c

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
#include "shallow.h"
2929
#include "tree.h"
3030
#include "hook.h"
31+
#include "parse.h"
3132

3233
static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
3334

@@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
572573
return -1;
573574
if (item->object.parsed)
574575
return 0;
575-
if (use_commit_graph && parse_commit_in_graph(r, item))
576+
if (use_commit_graph && parse_commit_in_graph(r, item)) {
577+
static int commit_graph_paranoia = -1;
578+
579+
if (commit_graph_paranoia == -1)
580+
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
581+
582+
if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
583+
unparse_commit(r, &item->object.oid);
584+
return quiet_on_missing ? -1 :
585+
error(_("commit %s exists in commit-graph but not in the object database"),
586+
oid_to_hex(&item->object.oid));
587+
}
588+
576589
return 0;
590+
}
577591

578592
if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
579593
return quiet_on_missing ? -1 :

t/t5318-commit-graph.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -836,4 +836,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
836836
)
837837
'
838838

839+
test_expect_success 'stale commit cannot be parsed when traversing graph' '
840+
test_when_finished "rm -rf repo" &&
841+
git init repo &&
842+
(
843+
cd repo &&
844+
845+
test_commit A &&
846+
test_commit B &&
847+
test_commit C &&
848+
git commit-graph write --reachable &&
849+
850+
# Corrupt the repository by deleting the intermediate commit
851+
# object. Commands should notice that this object is absent and
852+
# thus that the repository is corrupt even if the commit graph
853+
# exists.
854+
oid=$(git rev-parse B) &&
855+
rm .git/objects/"$(test_oid_to_path "$oid")" &&
856+
857+
# Again, we should be able to parse the commit when not
858+
# being paranoid about commit graph staleness...
859+
GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
860+
# ... but fail when we are paranoid.
861+
test_must_fail git rev-parse HEAD~2 2>error &&
862+
grep "error: commit $oid exists in commit-graph but not in the object database" error
863+
)
864+
'
865+
839866
test_done

0 commit comments

Comments
 (0)