Skip to content

Commit a8db000

Browse files
brandb97gitster
authored andcommitted
bloom: enable bloom filter with wildcard pathspec in revision traversal
When traversing commits, a pathspec item can be used to limit the traversal to commits that modify the specified paths. And the commit-graph includes a Bloom filter to exclude commits that definitely did not modify a given pathspec item. During commit traversal, the Bloom filter can significantly improve performance. However, it is disabled if the specified pathspec item contains wildcard characters or magic signatures. For performance reason, enable Bloom filter even if a pathspec item contains wildcard characters by filtering only the non-wildcard part of the pathspec item. The function of pathspec magic signature is generally to narrow down the path specified by the pathspecs. So, enable Bloom filter when the magic signature is "top", "glob", "attr", "--depth" or "literal". "exclude" is used to select paths other than the specified path, rather than serving as a filtering function, so it cannot be used together with the Bloom filter. Since Bloom filter is not case insensitive even in case insensitive system (e.g. MacOS), it cannot be used together with "icase" magic. With this optimization, we get some improvements for pathspecs with wildcards or magic signatures. First, in the Git repository we see these modest results: git log -100 -- "t/*" Benchmark 1: new Time (mean ± σ): 20.4 ms ± 0.6 ms Range (min … max): 19.3 ms … 24.4 ms Benchmark 2: old Time (mean ± σ): 23.4 ms ± 0.5 ms Range (min … max): 22.5 ms … 24.7 ms git log -100 -- ":(top)t" Benchmark 1: new Time (mean ± σ): 16.2 ms ± 0.4 ms Range (min … max): 15.3 ms … 17.2 ms Benchmark 2: old Time (mean ± σ): 18.6 ms ± 0.5 ms Range (min … max): 17.6 ms … 20.4 ms But in a larger repo, such as the LLVM project repo below, we get even better results: git log -100 -- "libc/*" Benchmark 1: new Time (mean ± σ): 16.0 ms ± 0.6 ms Range (min … max): 14.7 ms … 17.8 ms Benchmark 2: old Time (mean ± σ): 26.7 ms ± 0.5 ms Range (min … max): 25.4 ms … 27.8 ms git log -100 -- ":(top)libc" Benchmark 1: new Time (mean ± σ): 15.6 ms ± 0.6 ms Range (min … max): 14.4 ms … 17.7 ms Benchmark 2: old Time (mean ± σ): 19.6 ms ± 0.5 ms Range (min … max): 18.6 ms … 20.6 ms Signed-off-by: Lidong Yan <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 2c2ba49 commit a8db000

File tree

2 files changed

+51
-10
lines changed

2 files changed

+51
-10
lines changed

revision.c

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
671671

672672
static int forbid_bloom_filters(struct pathspec *spec)
673673
{
674-
if (spec->has_wildcard)
675-
return 1;
676-
if (spec->magic & ~PATHSPEC_LITERAL)
674+
unsigned int allowed_magic =
675+
PATHSPEC_FROMTOP |
676+
PATHSPEC_MAXDEPTH |
677+
PATHSPEC_LITERAL |
678+
PATHSPEC_GLOB |
679+
PATHSPEC_ATTR;
680+
681+
if (spec->magic & ~allowed_magic)
677682
return 1;
678683
for (size_t nr = 0; nr < spec->nr; nr++)
679-
if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
684+
if (spec->items[nr].magic & ~allowed_magic)
680685
return 1;
681686

682687
return 0;
@@ -693,9 +698,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
693698
size_t len;
694699
int res = 0;
695700

701+
len = pi->nowildcard_len;
702+
if (len != pi->len) {
703+
/*
704+
* for path like "/dir/file*", nowildcard part would be
705+
* "/dir/file", but only "/dir" should be used for the
706+
* bloom filter
707+
*/
708+
while (len > 0 && pi->match[len - 1] != '/')
709+
len--;
710+
}
696711
/* remove single trailing slash from path, if needed */
697-
if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
698-
path_alloc = xmemdupz(pi->match, pi->len - 1);
712+
if (len > 0 && pi->match[len - 1] == '/')
713+
len--;
714+
715+
if (len != pi->len) {
716+
path_alloc = xmemdupz(pi->match, len);
699717
path = path_alloc;
700718
} else
701719
path = pi->match;

t/t4216-log-bloom.sh

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
154154
test_bloom_filters_used "-- file*"
155155
'
156156

157-
test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
157+
test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
158+
test_bloom_filters_used "-- A/\* file4" &&
159+
test_bloom_filters_used "-- A/file\*" &&
160+
test_bloom_filters_used "-- * A/\*"
161+
'
162+
163+
test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
158164
test_bloom_filters_not_used "-- file\*" &&
159-
test_bloom_filters_not_used "-- A/\* file4" &&
160-
test_bloom_filters_not_used "-- file4 A/\*" &&
161-
test_bloom_filters_not_used "-- * A/\*"
165+
test_bloom_filters_not_used "-- file\* A/\*" &&
166+
test_bloom_filters_not_used "-- file\* *" &&
167+
test_bloom_filters_not_used "-- \*"
168+
'
169+
170+
test_expect_success 'git log with path contains various magic signatures' '
171+
cd A &&
172+
test_bloom_filters_used "-- \:\(top\)B" &&
173+
cd .. &&
174+
175+
test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
176+
test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
177+
test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
178+
179+
test_when_finished "rm -f .gitattributes" &&
180+
cat >.gitattributes <<-EOF &&
181+
A/file1 text
182+
A/B/file2 -text
183+
EOF
184+
test_bloom_filters_used "-- \:\(attr\:text\)A"
162185
'
163186

164187
test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '

0 commit comments

Comments
 (0)