Skip to content

Commit 8295296

Browse files
committed
Merge branch 'ds/commit-graph-fsck' into jt/commit-graph-per-object-store
* ds/commit-graph-fsck: (23 commits) coccinelle: update commit.cocci commit-graph: update design document gc: automatically write commit-graph files commit-graph: add '--reachable' option commit-graph: use string-list API for input fsck: verify commit-graph commit-graph: verify contents match checksum commit-graph: test for corrupted octopus edge commit-graph: verify commit date commit-graph: verify generation number commit-graph: verify parent list commit-graph: verify root tree OIDs commit-graph: verify objects exist commit-graph: verify corrupt OID fanout and lookup commit-graph: verify required chunks are present commit-graph: verify catches corrupt signature commit-graph: add 'verify' subcommand commit-graph: load a root tree from specific graph commit: force commit to parse from object database commit-graph: parse commit from chosen graph ...
2 parents 1f6c72f + b18ef13 commit 8295296

File tree

14 files changed

+575
-83
lines changed

14 files changed

+575
-83
lines changed

Documentation/config.txt

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -904,9 +904,12 @@ core.notesRef::
904904
This setting defaults to "refs/notes/commits", and it can be overridden by
905905
the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].
906906

907-
core.commitGraph::
908-
Enable git commit graph feature. Allows reading from the
909-
commit-graph file.
907+
gc.commitGraph::
908+
If true, then gc will rewrite the commit-graph file when
909+
linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
910+
'--auto' the commit-graph will be updated if housekeeping is
911+
required. Default is false. See linkgit:git-commit-graph[1]
912+
for details.
910913

911914
core.sparseCheckout::
912915
Enable "sparse checkout" feature. See section "Sparse checkout" in

Documentation/git-commit-graph.txt

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ SYNOPSIS
1010
--------
1111
[verse]
1212
'git commit-graph read' [--object-dir <dir>]
13+
'git commit-graph verify' [--object-dir <dir>]
1314
'git commit-graph write' <options> [--object-dir <dir>]
1415

1516

@@ -37,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
3738
+
3839
With the `--stdin-packs` option, generate the new commit graph by
3940
walking objects only in the specified pack-indexes. (Cannot be combined
40-
with --stdin-commits.)
41+
with `--stdin-commits` or `--reachable`.)
4142
+
4243
With the `--stdin-commits` option, generate the new commit graph by
4344
walking commits starting at the commits specified in stdin as a list
4445
of OIDs in hex, one OID per line. (Cannot be combined with
45-
--stdin-packs.)
46+
`--stdin-packs` or `--reachable`.)
47+
+
48+
With the `--reachable` option, generate the new commit graph by walking
49+
commits starting at all refs. (Cannot be combined with `--stdin-commits`
50+
or `--stdin-packs`.)
4651
+
4752
With the `--append` option, include all commits that are present in the
4853
existing commit-graph file.
@@ -52,6 +57,11 @@ existing commit-graph file.
5257
Read a graph file given by the commit-graph file and output basic
5358
details about the graph file. Used for debugging purposes.
5459

60+
'verify'::
61+
62+
Read the commit-graph file and verify its contents against the object
63+
database. Used to check for corrupted data.
64+
5565

5666
EXAMPLES
5767
--------

Documentation/git-fsck.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
110110
(i.e., you can just remove them and do an 'rsync' with some other site in
111111
the hopes that somebody else has the object you have corrupted).
112112

113+
If core.commitGraph is true, the commit-graph file will also be inspected
114+
using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
115+
113116
Extracted Diagnostics
114117
---------------------
115118

Documentation/git-gc.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
136136
it within all non-bare repos or it can be set to a boolean value.
137137
This defaults to true.
138138

139+
The optional configuration variable `gc.commitGraph` determines if
140+
'git gc' should run 'git commit-graph write'. This can be set to a
141+
boolean value. This defaults to false.
142+
139143
The optional configuration variable `gc.aggressiveWindow` controls how
140144
much time is spent optimizing the delta compression of the objects in
141145
the repository when the --aggressive option is specified. The larger

Documentation/technical/commit-graph.txt

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -118,9 +118,6 @@ Future Work
118118
- The commit graph feature currently does not honor commit grafts. This can
119119
be remedied by duplicating or refactoring the current graft logic.
120120

121-
- The 'commit-graph' subcommand does not have a "verify" mode that is
122-
necessary for integration with fsck.
123-
124121
- After computing and storing generation numbers, we must make graph
125122
walks aware of generation numbers to gain the performance benefits they
126123
enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
130127
- 'log --topo-order'
131128
- 'tag --merged'
132129

133-
- Currently, parse_commit_gently() requires filling in the root tree
134-
object for a commit. This passes through lookup_tree() and consequently
135-
lookup_object(). Also, it calls lookup_commit() when loading the parents.
136-
These method calls check the ODB for object existence, even if the
137-
consumer does not need the content. For example, we do not need the
138-
tree contents when computing merge bases. Now that commit parsing is
139-
removed from the computation time, these lookup operations are the
140-
slowest operations keeping graph walks from being fast. Consider
141-
loading these objects without verifying their existence in the ODB and
142-
only loading them fully when consumers need them. Consider a method
143-
such as "ensure_tree_loaded(commit)" that fully loads a tree before
144-
using commit->tree.
145-
146-
- The current design uses the 'commit-graph' subcommand to generate the graph.
147-
When this feature stabilizes enough to recommend to most users, we should
148-
add automatic graph writes to common operations that create many commits.
149-
For example, one could compute a graph on 'clone', 'fetch', or 'repack'
150-
commands.
151-
152130
- A server could provide a commit graph file as part of the network protocol
153131
to avoid extra calculations by clients. This feature is only of benefit if
154132
the user is willing to trust the file, because verifying the file is correct

builtin/commit-graph.c

Lines changed: 68 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,19 @@
33
#include "dir.h"
44
#include "lockfile.h"
55
#include "parse-options.h"
6+
#include "repository.h"
67
#include "commit-graph.h"
78

89
static char const * const builtin_commit_graph_usage[] = {
910
N_("git commit-graph [--object-dir <objdir>]"),
1011
N_("git commit-graph read [--object-dir <objdir>]"),
11-
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
12+
N_("git commit-graph verify [--object-dir <objdir>]"),
13+
N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
14+
NULL
15+
};
16+
17+
static const char * const builtin_commit_graph_verify_usage[] = {
18+
N_("git commit-graph verify [--object-dir <objdir>]"),
1219
NULL
1320
};
1421

@@ -18,17 +25,48 @@ static const char * const builtin_commit_graph_read_usage[] = {
1825
};
1926

2027
static const char * const builtin_commit_graph_write_usage[] = {
21-
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
28+
N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
2229
NULL
2330
};
2431

2532
static struct opts_commit_graph {
2633
const char *obj_dir;
34+
int reachable;
2735
int stdin_packs;
2836
int stdin_commits;
2937
int append;
3038
} opts;
3139

40+
41+
static int graph_verify(int argc, const char **argv)
42+
{
43+
struct commit_graph *graph = NULL;
44+
char *graph_name;
45+
46+
static struct option builtin_commit_graph_verify_options[] = {
47+
OPT_STRING(0, "object-dir", &opts.obj_dir,
48+
N_("dir"),
49+
N_("The object directory to store the graph")),
50+
OPT_END(),
51+
};
52+
53+
argc = parse_options(argc, argv, NULL,
54+
builtin_commit_graph_verify_options,
55+
builtin_commit_graph_verify_usage, 0);
56+
57+
if (!opts.obj_dir)
58+
opts.obj_dir = get_object_directory();
59+
60+
graph_name = get_commit_graph_filename(opts.obj_dir);
61+
graph = load_commit_graph_one(graph_name);
62+
FREE_AND_NULL(graph_name);
63+
64+
if (!graph)
65+
return 0;
66+
67+
return verify_commit_graph(the_repository, graph);
68+
}
69+
3270
static int graph_read(int argc, const char **argv)
3371
{
3472
struct commit_graph *graph = NULL;
@@ -51,8 +89,11 @@ static int graph_read(int argc, const char **argv)
5189
graph_name = get_commit_graph_filename(opts.obj_dir);
5290
graph = load_commit_graph_one(graph_name);
5391

54-
if (!graph)
92+
if (!graph) {
93+
UNLEAK(graph_name);
5594
die("graph file %s does not exist", graph_name);
95+
}
96+
5697
FREE_AND_NULL(graph_name);
5798

5899
printf("header: %08x %d %d %d %d\n",
@@ -79,18 +120,16 @@ static int graph_read(int argc, const char **argv)
79120

80121
static int graph_write(int argc, const char **argv)
81122
{
82-
const char **pack_indexes = NULL;
83-
int packs_nr = 0;
84-
const char **commit_hex = NULL;
85-
int commits_nr = 0;
86-
const char **lines = NULL;
87-
int lines_nr = 0;
88-
int lines_alloc = 0;
123+
struct string_list *pack_indexes = NULL;
124+
struct string_list *commit_hex = NULL;
125+
struct string_list lines;
89126

90127
static struct option builtin_commit_graph_write_options[] = {
91128
OPT_STRING(0, "object-dir", &opts.obj_dir,
92129
N_("dir"),
93130
N_("The object directory to store the graph")),
131+
OPT_BOOL(0, "reachable", &opts.reachable,
132+
N_("start walk at all refs")),
94133
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
95134
N_("scan pack-indexes listed by stdin for commits")),
96135
OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -104,39 +143,35 @@ static int graph_write(int argc, const char **argv)
104143
builtin_commit_graph_write_options,
105144
builtin_commit_graph_write_usage, 0);
106145

107-
if (opts.stdin_packs && opts.stdin_commits)
108-
die(_("cannot use both --stdin-commits and --stdin-packs"));
146+
if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
147+
die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
109148
if (!opts.obj_dir)
110149
opts.obj_dir = get_object_directory();
111150

151+
if (opts.reachable) {
152+
write_commit_graph_reachable(opts.obj_dir, opts.append);
153+
return 0;
154+
}
155+
156+
string_list_init(&lines, 0);
112157
if (opts.stdin_packs || opts.stdin_commits) {
113158
struct strbuf buf = STRBUF_INIT;
114-
lines_nr = 0;
115-
lines_alloc = 128;
116-
ALLOC_ARRAY(lines, lines_alloc);
117-
118-
while (strbuf_getline(&buf, stdin) != EOF) {
119-
ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
120-
lines[lines_nr++] = strbuf_detach(&buf, NULL);
121-
}
122-
123-
if (opts.stdin_packs) {
124-
pack_indexes = lines;
125-
packs_nr = lines_nr;
126-
}
127-
if (opts.stdin_commits) {
128-
commit_hex = lines;
129-
commits_nr = lines_nr;
130-
}
159+
160+
while (strbuf_getline(&buf, stdin) != EOF)
161+
string_list_append(&lines, strbuf_detach(&buf, NULL));
162+
163+
if (opts.stdin_packs)
164+
pack_indexes = &lines;
165+
if (opts.stdin_commits)
166+
commit_hex = &lines;
131167
}
132168

133169
write_commit_graph(opts.obj_dir,
134170
pack_indexes,
135-
packs_nr,
136171
commit_hex,
137-
commits_nr,
138172
opts.append);
139173

174+
string_list_clear(&lines, 0);
140175
return 0;
141176
}
142177

@@ -162,6 +197,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
162197
if (argc > 0) {
163198
if (!strcmp(argv[0], "read"))
164199
return graph_read(argc, argv);
200+
if (!strcmp(argv[0], "verify"))
201+
return graph_verify(argc, argv);
165202
if (!strcmp(argv[0], "write"))
166203
return graph_write(argc, argv);
167204
}

builtin/fsck.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include "decorate.h"
1919
#include "packfile.h"
2020
#include "object-store.h"
21+
#include "run-command.h"
2122

2223
#define REACHABLE 0x0001
2324
#define SEEN 0x0002
@@ -47,6 +48,7 @@ static int name_objects;
4748
#define ERROR_REACHABLE 02
4849
#define ERROR_PACK 04
4950
#define ERROR_REFS 010
51+
#define ERROR_COMMIT_GRAPH 020
5052

5153
static const char *describe_object(struct object *obj)
5254
{
@@ -827,5 +829,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
827829
}
828830

829831
check_connectivity();
832+
833+
if (core_commit_graph) {
834+
struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
835+
const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
836+
837+
commit_graph_verify.argv = verify_argv;
838+
commit_graph_verify.git_cmd = 1;
839+
if (run_command(&commit_graph_verify))
840+
errors_found |= ERROR_COMMIT_GRAPH;
841+
842+
prepare_alt_odb(the_repository);
843+
for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
844+
verify_argv[2] = "--object-dir";
845+
verify_argv[3] = alt->path;
846+
if (run_command(&commit_graph_verify))
847+
errors_found |= ERROR_COMMIT_GRAPH;
848+
}
849+
}
850+
830851
return errors_found;
831852
}

builtin/gc.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "sigchain.h"
2121
#include "argv-array.h"
2222
#include "commit.h"
23+
#include "commit-graph.h"
2324
#include "packfile.h"
2425
#include "object-store.h"
2526
#include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
4041
static int aggressive_window = 250;
4142
static int gc_auto_threshold = 6700;
4243
static int gc_auto_pack_limit = 50;
44+
static int gc_write_commit_graph;
4345
static int detach_auto = 1;
4446
static timestamp_t gc_log_expire_time;
4547
static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
129131
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
130132
git_config_get_int("gc.auto", &gc_auto_threshold);
131133
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
134+
git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
132135
git_config_get_bool("gc.autodetach", &detach_auto);
133136
git_config_get_expiry("gc.pruneexpire", &prune_expire);
134137
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
641644
if (pack_garbage.nr > 0)
642645
clean_pack_garbage();
643646

647+
if (gc_write_commit_graph)
648+
write_commit_graph_reachable(get_object_directory(), 0);
649+
644650
if (auto_gc && too_many_loose_objects())
645651
warning(_("There are too many unreachable loose objects; "
646652
"run 'git prune' to remove them."));

0 commit comments

Comments
 (0)