Skip to content

Commit 3b0199d

Browse files
derrickstoleegitster
authored andcommitted
commit-graph: start parsing generation v2 (again)
The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110a (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 75979d9 commit 3b0199d

File tree

5 files changed

+23
-5
lines changed

5 files changed

+23
-5
lines changed

commit-graph.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r,
407407
&graph->chunk_generation_data);
408408
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
409409
&graph->chunk_generation_data_overflow);
410+
411+
if (graph->chunk_generation_data)
412+
graph->read_generation_data = 1;
410413
}
411414

412415
if (r->settings.commit_graph_read_changed_paths) {

t/lib-commit-graph.sh

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,21 @@ graph_read_expect() {
3737
OPTIONAL=" $2"
3838
NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
3939
fi
40+
GENERATION_VERSION=2
41+
if test -n "$3"
42+
then
43+
GENERATION_VERSION=$3
44+
fi
45+
OPTIONS=
46+
if test $GENERATION_VERSION -gt 1
47+
then
48+
OPTIONS=" read_generation_data"
49+
fi
4050
cat >expect <<- EOF
4151
header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
4252
num_commits: $1
4353
chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
44-
options:
54+
options:$OPTIONS
4555
EOF
4656
test-tool read-graph >output &&
4757
test_cmp expect output

t/t4216-log-bloom.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ graph_read_expect () {
4848
header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
4949
num_commits: $1
5050
chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data
51-
options: bloom(1,10,7)
51+
options: bloom(1,10,7) read_generation_data
5252
EOF
5353
test-tool read-graph >actual &&
5454
test_cmp expect actual

t/t5318-commit-graph.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -456,7 +456,7 @@ test_expect_success 'git commit-graph verify' '
456456
cd "$TRASH_DIRECTORY/full" &&
457457
git rev-parse commits/8 | git -c commitGraph.generationVersion=1 commit-graph write --stdin-commits &&
458458
git commit-graph verify >output &&
459-
graph_read_expect 9 extra_edges
459+
graph_read_expect 9 extra_edges 1
460460
'
461461

462462
NUM_COMMITS=9

t/t5324-split-commit-graph.sh

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,16 @@ graph_read_expect() {
3030
then
3131
NUM_BASE=$2
3232
fi
33+
OPTIONS=
34+
if test -z "$3"
35+
then
36+
OPTIONS=" read_generation_data"
37+
fi
3338
cat >expect <<- EOF
3439
header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE
3540
num_commits: $1
3641
chunks: oid_fanout oid_lookup commit_metadata generation_data
37-
options:
42+
options:$OPTIONS
3843
EOF
3944
test-tool read-graph >output &&
4045
test_cmp expect output
@@ -624,7 +629,7 @@ test_expect_success 'write generation data chunk if topmost remaining layer has
624629
header: 43475048 1 $(test_oid oid_version) 5 1
625630
num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS))
626631
chunks: oid_fanout oid_lookup commit_metadata generation_data
627-
options:
632+
options: read_generation_data
628633
EOF
629634
test_cmp expect output
630635
)

0 commit comments

Comments
 (0)