Skip to content

Commit cf73936

Browse files
ttaylorrgitster
authored andcommitted
commit-graph: ensure Bloom filters are read with consistent settings
The changed-path Bloom filter mechanism is parameterized by a couple of variables, notably the number of bits per hash (typically "m" in Bloom filter literature) and the number of hashes themselves (typically "k"). It is critically important that filters are read with the Bloom filter settings that they were written with. Failing to do so would mean that each query is liable to compute different fingerprints, meaning that the filter itself could return a false negative. This goes against a basic assumption of using Bloom filters (that they may return false positives, but never false negatives) and can lead to incorrect results. We have some existing logic to carry forward existing Bloom filter settings from one layer to the next. In `write_commit_graph()`, we have something like: if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { struct commit_graph *g = ctx->r->objects->commit_graph; /* We have changed-paths already. Keep them in the next graph */ if (g && g->chunk_bloom_data) { ctx->changed_paths = 1; ctx->bloom_settings = g->bloom_filter_settings; } } , which drags forward Bloom filter settings across adjacent layers. This doesn't quite address all cases, however, since it is possible for intermediate layers to contain no Bloom filters at all. For example, suppose we have two layers in a commit-graph chain, say, {G1, G2}. If G1 contains Bloom filters, but G2 doesn't, a new G3 (whose base graph is G2) may be written with arbitrary Bloom filter settings, because we only check the immediately adjacent layer's settings for compatibility. This behavior has existed since the introduction of changed-path Bloom filters. But in practice, this is not such a big deal, since the only way up until this point to modify the Bloom filter settings at write time is with the undocumented environment variables: - GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY - GIT_TEST_BLOOM_SETTINGS_NUM_HASHES - GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS (it is still possible to tweak MAX_CHANGED_PATHS between layers, but this does not affect reads, so is allowed to differ across multiple graph layers). But in future commits, we will introduce another parameter to change the hash algorithm used to compute Bloom fingerprints itself. This will be exposed via a configuration setting, making this foot-gun easier to use. To prevent this potential issue, validate that all layers of a split commit-graph have compatible settings with the newest layer which contains Bloom filters. Reported-by: SZEDER Gábor <[email protected]> Original-test-by: SZEDER Gábor <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 1343c89 commit cf73936

File tree

2 files changed

+92
-1
lines changed

2 files changed

+92
-1
lines changed

commit-graph.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,30 @@ static int validate_mixed_generation_chain(struct commit_graph *g)
543543
return 0;
544544
}
545545

546+
static void validate_mixed_bloom_settings(struct commit_graph *g)
547+
{
548+
struct bloom_filter_settings *settings = NULL;
549+
for (; g; g = g->base_graph) {
550+
if (!g->bloom_filter_settings)
551+
continue;
552+
if (!settings) {
553+
settings = g->bloom_filter_settings;
554+
continue;
555+
}
556+
557+
if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
558+
g->bloom_filter_settings->num_hashes != settings->num_hashes) {
559+
g->chunk_bloom_indexes = NULL;
560+
g->chunk_bloom_data = NULL;
561+
FREE_AND_NULL(g->bloom_filter_settings);
562+
563+
warning(_("disabling Bloom filters for commit-graph "
564+
"layer '%s' due to incompatible settings"),
565+
oid_to_hex(&g->oid));
566+
}
567+
}
568+
}
569+
546570
static int add_graph_to_chain(struct commit_graph *g,
547571
struct commit_graph *chain,
548572
struct object_id *oids,
@@ -666,6 +690,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r,
666690
}
667691

668692
validate_mixed_generation_chain(graph_chain);
693+
validate_mixed_bloom_settings(graph_chain);
669694

670695
free(oids);
671696
fclose(fp);

t/t4216-log-bloom.sh

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -421,8 +421,74 @@ test_expect_success 'Bloom generation backfills empty commits' '
421421
)
422422
'
423423

424+
graph=.git/objects/info/commit-graph
425+
graphdir=.git/objects/info/commit-graphs
426+
chain=$graphdir/commit-graph-chain
427+
428+
test_expect_success 'setup for mixed Bloom setting tests' '
429+
repo=mixed-bloom-settings &&
430+
431+
git init $repo &&
432+
for i in one two three
433+
do
434+
test_commit -C $repo $i file || return 1
435+
done
436+
'
437+
438+
test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
439+
# Compute Bloom filters with "unusual" settings.
440+
git -C $repo rev-parse one >in &&
441+
GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
442+
--stdin-commits --changed-paths --split <in &&
443+
layer=$(head -n 1 $repo/$chain) &&
444+
445+
# A commit-graph layer without Bloom filters "hides" the layers
446+
# below ...
447+
git -C $repo rev-parse two >in &&
448+
git -C $repo commit-graph write --stdin-commits --no-changed-paths \
449+
--split=no-merge <in &&
450+
451+
# Another commit-graph layer that has Bloom filters, but with
452+
# standard settings, and is thus incompatible with the base
453+
# layer written above.
454+
git -C $repo rev-parse HEAD >in &&
455+
git -C $repo commit-graph write --stdin-commits --changed-paths \
456+
--split=no-merge <in &&
457+
458+
test_line_count = 3 $repo/$chain &&
459+
460+
# Ensure that incompatible Bloom filters are ignored.
461+
git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
462+
>expect 2>err &&
463+
git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
464+
test_cmp expect actual &&
465+
grep "disabling Bloom filters for commit-graph layer .$layer." err
466+
'
467+
468+
test_expect_success 'merge graph layers with incompatible Bloom settings' '
469+
# Ensure that incompatible Bloom filters are ignored when
470+
# merging existing layers.
471+
>trace2.txt &&
472+
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
473+
git -C $repo commit-graph write --reachable --changed-paths 2>err &&
474+
grep "disabling Bloom filters for commit-graph layer .$layer." err &&
475+
grep "{\"hash_version\":1,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt &&
476+
477+
test_path_is_file $repo/$graph &&
478+
test_dir_is_empty $repo/$graphdir &&
479+
480+
git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- \
481+
file >expect &&
482+
trace_out="$(pwd)/trace.perf" &&
483+
GIT_TRACE2_PERF="$trace_out" \
484+
git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
485+
486+
test_cmp expect actual &&
487+
grep "statistics:{\"filter_not_present\":0," trace.perf &&
488+
test_must_be_empty err
489+
'
490+
424491
corrupt_graph () {
425-
graph=.git/objects/info/commit-graph &&
426492
test_when_finished "rm -rf $graph" &&
427493
git commit-graph write --reachable --changed-paths &&
428494
corrupt_chunk_file $graph "$@"

0 commit comments

Comments
 (0)