Skip to content

Commit b0afdce

Browse files
ttaylorrgitster
authored andcommitted
pack-bitmap.c: use commit boundary during bitmap traversal
When reachability bitmap coverage exists in a repository, Git will use a different (and hopefully faster) traversal to compute revision walks. Consider a set of positive and negative tips (which we'll refer to with their standard bitmap parlance by "wants", and "haves"). In order to figure out what objects exist between the tips, the existing traversal in `prepare_bitmap_walk()` does something like: 1. Consider if we can even compute the set of objects with bitmaps, and fall back to the usual traversal if we cannot. For example, pathspec limiting traversals can't be computed using bitmaps (since they don't know which objects are at which paths). The same is true of certain kinds of non-trivial object filters. 2. If we can compute the traversal with bitmaps, partition the (dereferenced) tips into two object lists, "haves", and "wants", based on whether or not the objects have the UNINTERESTING flag, respectively. 3. Fall back to the ordinary object traversal if either (a) there are more than zero haves, none of which are in the bitmapped pack or MIDX, or (b) there are no wants. 4. Construct a reachability bitmap for the "haves" side by walking from the revision tips down to any existing bitmaps, OR-ing in any bitmaps as they are found. 5. Then do the same for the "wants" side, stopping at any objects that appear in the "haves" bitmap. 6. Filter the results if any object filter (that can be easily computed with bitmaps alone) was given, and then return back to the caller. When there is good bitmap coverage relative to the traversal tips, this walk is often significantly faster than an ordinary object traversal because it can visit far fewer objects. But in certain cases, it can be significantly *slower* than the usual object traversal. Why? Because we need to compute complete bitmaps on either side of the walk. If either one (or both) of the sides require walking many (or all!) objects before they get to an existing bitmap, the extra bitmap machinery is mostly or all overhead. One of the benefits, however, is that even if the walk is slower, bitmap traversals are guaranteed to provide an *exact* answer. Unlike the traditional object traversal algorithm, which can over-count the results by not opening trees for older commits, the bitmap walk builds an exact reachability bitmap for either side, meaning the results are never over-counted. But producing non-exact results is OK for our traversal here (both in the bitmap case and not), as long as the results are over-counted, not under. Relaxing the bitmap traversal to allow it to produce over-counted results gives us the opportunity to make some significant improvements. Instead of the above, the new algorithm only has to walk from the *boundary* down to the nearest bitmap, instead of from each of the UNINTERESTING tips. The boundary-based approach still has degenerate cases, but we'll show in a moment that it is often a significant improvement. The new algorithm works as follows: 1. Build a (partial) bitmap of the haves side by first OR-ing any bitmap(s) that already exist for UNINTERESTING commits between the haves and the boundary. 2. For each commit along the boundary, add it as a fill-in traversal tip (where the traversal terminates once an existing bitmap is found), and perform fill-in traversal. 3. Build up a complete bitmap of the wants side as usual, stopping any time we intersect the (partial) haves side. 4. Return the results. And is more-or-less equivalent to using the *old* algorithm with this invocation: $ git rev-list --objects --use-bitmap-index $WANTS --not \ $(git rev-list --objects --boundary $WANTS --not $HAVES | perl -lne 'print $1 if /^-(.*)/') The new result performs significantly better in many cases, particularly when the distance from the boundary commit(s) to an existing bitmap is shorter than the distance from (all of) the have tips to the nearest bitmapped commit. Note that when using the old bitmap traversal algorithm, the results can be *slower* than without bitmaps! Under the new algorithm, the result is computed faster with bitmaps than without (at the cost of over-counting the true number of objects in a similar fashion as the non-bitmap traversal): # (Computing the number of tagged objects not on any branches # without bitmaps). $ time git rev-list --count --objects --tags --not --branches 20 real 0m1.388s user 0m1.092s sys 0m0.296s # (Computing the same query using the old bitmap traversal). $ time git rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m22.709s user 0m21.628s sys 0m1.076s # (this commit) $ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m1.518s user 0m1.234s sys 0m0.284s The new algorithm is still slower than not using bitmaps at all, but it is nearly a 15-fold improvement over the existing traversal. In a more realistic setting (using my local copy of git.git), I can observe a similar (if more modest) speed-up: $ argv="--count --objects --branches --not --tags" hyperfine \ -n 'no bitmaps' "git.compile rev-list $argv" \ -n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \ -n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv" Benchmark 1: no bitmaps Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms] Range (min … max): 122.6 ms … 133.1 ms 22 runs Benchmark 2: existing traversal Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms] Range (min … max): 365.1 ms … 374.8 ms 10 runs Benchmark 3: boundary traversal Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms] Range (min … max): 166.1 ms … 169.2 ms 17 runs Summary 'no bitmaps' ran 1.34 ± 0.02 times faster than 'boundary traversal' 2.96 ± 0.05 times faster than 'existing traversal' Here, the new algorithm is also still slower than not using bitmaps, but represents a more than 2-fold improvement over the existing traversal in a more modest example. Since this algorithm was originally written (nearly a year and a half ago, at the time of writing), the bitmap lookup table shipped, making the new algorithm's result more competitive. A few other future directions for improving bitmap traversal times beyond not using bitmaps at all: - Decrease the cost to decompress and OR together many bitmaps together (particularly when enumerating the uninteresting side of the walk). Here we could explore more efficient bitmap storage techniques, like Roaring+Run and/or use SIMD instructions to speed up ORing them together. - Store pseudo-merge bitmaps, which could allow us to OR together fewer "summary" bitmaps (which would also help with the above). Helped-by: Jeff King <[email protected]> Helped-by: Derrick Stolee <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 47ff853 commit b0afdce

File tree

9 files changed

+243
-14
lines changed

9 files changed

+243
-14
lines changed

Documentation/config/feature.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ feature.experimental::
1414
+
1515
* `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
1616
skipping more commits at a time, reducing the number of round trips.
17+
+
18+
* `pack.useBitmapBoundaryTraversal=true` may improve bitmap traversal times by
19+
walking fewer objects.
1720

1821
feature.manyFiles::
1922
Enable config options that optimize for repos with many files in the

Documentation/config/pack.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,23 @@ pack.useBitmaps::
123123
true. You should not generally need to turn this off unless
124124
you are debugging pack bitmaps.
125125

126+
pack.useBitmapBoundaryTraversal::
127+
When true, Git will use an experimental algorithm for computing
128+
reachability queries with bitmaps. Instead of building up
129+
complete bitmaps for all of the negated tips and then OR-ing
130+
them together, consider negated tips with existing bitmaps as
131+
additive (i.e. OR-ing them into the result if they exist,
132+
ignoring them otherwise), and build up a bitmap at the boundary
133+
instead.
134+
+
135+
When using this algorithm, Git may include too many objects as a result
136+
of not opening up trees belonging to certain UNINTERESTING commits. This
137+
inexactness matches the non-bitmap traversal algorithm.
138+
+
139+
In many cases, this can provide a speed-up over the exact algorithm,
140+
particularly when there is poor bitmap coverage of the negated side of
141+
the query.
142+
126143
pack.useSparse::
127144
When true, git will default to using the '--sparse' option in
128145
'git pack-objects' when the '--revs' option is present. This

ci/run-build-and-tests.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ linux-TEST-vars)
2929
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
3030
export GIT_TEST_NO_WRITE_REV_INDEX=1
3131
export GIT_TEST_CHECKOUT_WORKERS=2
32+
export GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1
3233
;;
3334
linux-clang)
3435
export GIT_TEST_DEFAULT_HASH=sha1

pack-bitmap.c

Lines changed: 169 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1077,6 +1077,126 @@ static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
10771077
return base;
10781078
}
10791079

1080+
struct bitmap_boundary_cb {
1081+
struct bitmap_index *bitmap_git;
1082+
struct bitmap *base;
1083+
1084+
struct object_array boundary;
1085+
};
1086+
1087+
static void show_boundary_commit(struct commit *commit, void *_data)
1088+
{
1089+
struct bitmap_boundary_cb *data = _data;
1090+
1091+
if (commit->object.flags & BOUNDARY)
1092+
add_object_array(&commit->object, "", &data->boundary);
1093+
1094+
if (commit->object.flags & UNINTERESTING) {
1095+
if (bitmap_walk_contains(data->bitmap_git, data->base,
1096+
&commit->object.oid))
1097+
return;
1098+
1099+
add_commit_to_bitmap(data->bitmap_git, &data->base, commit);
1100+
}
1101+
}
1102+
1103+
static void show_boundary_object(struct object *object,
1104+
const char *name, void *data)
1105+
{
1106+
BUG("should not be called");
1107+
}
1108+
1109+
static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
1110+
struct rev_info *revs,
1111+
struct object_list *roots)
1112+
{
1113+
struct bitmap_boundary_cb cb;
1114+
struct object_list *root;
1115+
unsigned int i;
1116+
unsigned int tmp_blobs, tmp_trees, tmp_tags;
1117+
int any_missing = 0;
1118+
1119+
cb.bitmap_git = bitmap_git;
1120+
cb.base = bitmap_new();
1121+
object_array_init(&cb.boundary);
1122+
1123+
revs->ignore_missing_links = 1;
1124+
1125+
/*
1126+
* OR in any existing reachability bitmaps among `roots` into
1127+
* `cb.base`.
1128+
*/
1129+
for (root = roots; root; root = root->next) {
1130+
struct object *object = root->item;
1131+
if (object->type != OBJ_COMMIT ||
1132+
bitmap_walk_contains(bitmap_git, cb.base, &object->oid))
1133+
continue;
1134+
1135+
if (add_commit_to_bitmap(bitmap_git, &cb.base,
1136+
(struct commit *)object))
1137+
continue;
1138+
1139+
any_missing = 1;
1140+
}
1141+
1142+
if (!any_missing)
1143+
goto cleanup;
1144+
1145+
tmp_blobs = revs->blob_objects;
1146+
tmp_trees = revs->tree_objects;
1147+
tmp_tags = revs->blob_objects;
1148+
revs->blob_objects = 0;
1149+
revs->tree_objects = 0;
1150+
revs->tag_objects = 0;
1151+
1152+
/*
1153+
* We didn't have complete coverage of the roots. First setup a
1154+
* revision walk to (a) OR in any bitmaps that are UNINTERESTING
1155+
* between the tips and boundary, and (b) record the boundary.
1156+
*/
1157+
trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
1158+
if (prepare_revision_walk(revs))
1159+
die("revision walk setup failed");
1160+
trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
1161+
1162+
trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
1163+
revs->boundary = 1;
1164+
traverse_commit_list_filtered(revs,
1165+
show_boundary_commit,
1166+
show_boundary_object,
1167+
&cb, NULL);
1168+
revs->boundary = 0;
1169+
trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
1170+
1171+
revs->blob_objects = tmp_blobs;
1172+
revs->tree_objects = tmp_trees;
1173+
revs->tag_objects = tmp_tags;
1174+
1175+
reset_revision_walk();
1176+
clear_object_flags(UNINTERESTING);
1177+
1178+
/*
1179+
* Then add the boundary commit(s) as fill-in traversal tips.
1180+
*/
1181+
trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
1182+
for (i = 0; i < cb.boundary.nr; i++) {
1183+
struct object *obj = cb.boundary.objects[i].item;
1184+
if (bitmap_walk_contains(bitmap_git, cb.base, &obj->oid))
1185+
obj->flags |= SEEN;
1186+
else
1187+
add_pending_object(revs, obj, "");
1188+
}
1189+
if (revs->pending.nr)
1190+
cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
1191+
trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
1192+
1193+
cleanup:
1194+
object_array_clear(&cb.boundary);
1195+
revs->ignore_missing_links = 0;
1196+
1197+
return cb.base;
1198+
}
1199+
10801200
static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
10811201
struct rev_info *revs,
10821202
struct object_list *roots,
@@ -1142,8 +1262,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
11421262
}
11431263
}
11441264

1145-
if (needs_walk)
1265+
if (needs_walk) {
1266+
/*
1267+
* This fill-in traversal may walk over some objects
1268+
* again, since we have already traversed in order to
1269+
* find the boundary.
1270+
*
1271+
* But this extra walk should be extremely cheap, since
1272+
* all commit objects are loaded into memory, and
1273+
* because we skip walking to parents that are
1274+
* UNINTERESTING, since it will be marked in the haves
1275+
* bitmap already (or it has an on-disk bitmap, since
1276+
* OR-ing it in covers all of its ancestors).
1277+
*/
11461278
base = fill_in_bitmap(bitmap_git, revs, base, seen);
1279+
}
11471280

11481281
return base;
11491282
}
@@ -1535,6 +1668,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
15351668
int filter_provided_objects)
15361669
{
15371670
unsigned int i;
1671+
int use_boundary_traversal;
15381672

15391673
struct object_list *wants = NULL;
15401674
struct object_list *haves = NULL;
@@ -1585,13 +1719,21 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
15851719
object_list_insert(object, &wants);
15861720
}
15871721

1588-
/*
1589-
* if we have a HAVES list, but none of those haves is contained
1590-
* in the packfile that has a bitmap, we don't have anything to
1591-
* optimize here
1592-
*/
1593-
if (haves && !in_bitmapped_pack(bitmap_git, haves))
1594-
goto cleanup;
1722+
use_boundary_traversal = git_env_bool(GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL, -1);
1723+
if (use_boundary_traversal < 0) {
1724+
prepare_repo_settings(revs->repo);
1725+
use_boundary_traversal = revs->repo->settings.pack_use_bitmap_boundary_traversal;
1726+
}
1727+
1728+
if (!use_boundary_traversal) {
1729+
/*
1730+
* if we have a HAVES list, but none of those haves is contained
1731+
* in the packfile that has a bitmap, we don't have anything to
1732+
* optimize here
1733+
*/
1734+
if (haves && !in_bitmapped_pack(bitmap_git, haves))
1735+
goto cleanup;
1736+
}
15951737

15961738
/* if we don't want anything, we're done here */
15971739
if (!wants)
@@ -1605,18 +1747,32 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
16051747
if (load_bitmap(revs->repo, bitmap_git) < 0)
16061748
goto cleanup;
16071749

1608-
object_array_clear(&revs->pending);
1750+
if (!use_boundary_traversal)
1751+
object_array_clear(&revs->pending);
16091752

16101753
if (haves) {
1611-
revs->ignore_missing_links = 1;
1612-
haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
1613-
reset_revision_walk();
1614-
revs->ignore_missing_links = 0;
1754+
if (use_boundary_traversal) {
1755+
trace2_region_enter("pack-bitmap", "haves/boundary", the_repository);
1756+
haves_bitmap = find_boundary_objects(bitmap_git, revs, haves);
1757+
trace2_region_leave("pack-bitmap", "haves/boundary", the_repository);
1758+
} else {
1759+
trace2_region_enter("pack-bitmap", "haves/classic", the_repository);
1760+
revs->ignore_missing_links = 1;
1761+
haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
1762+
reset_revision_walk();
1763+
revs->ignore_missing_links = 0;
1764+
trace2_region_leave("pack-bitmap", "haves/classic", the_repository);
1765+
}
16151766

16161767
if (!haves_bitmap)
16171768
BUG("failed to perform bitmap walk");
16181769
}
16191770

1771+
if (use_boundary_traversal) {
1772+
object_array_clear(&revs->pending);
1773+
reset_revision_walk();
1774+
}
1775+
16201776
wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
16211777

16221778
if (!wants_bitmap)

pack-bitmap.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,10 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
6262
void test_bitmap_walk(struct rev_info *revs);
6363
int test_bitmap_commits(struct repository *r);
6464
int test_bitmap_hashes(struct repository *r);
65+
66+
#define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \
67+
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL"
68+
6569
struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
6670
int filter_provided_objects);
6771
uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);

repo-settings.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,10 @@ void prepare_repo_settings(struct repository *r)
4141
repo_cfg_bool(r, "feature.experimental", &experimental, 0);
4242

4343
/* Defaults modified by feature.* */
44-
if (experimental)
44+
if (experimental) {
4545
r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
46+
r->settings.pack_use_bitmap_boundary_traversal = 1;
47+
}
4648
if (manyfiles) {
4749
r->settings.index_version = 4;
4850
r->settings.index_skip_hash = 1;
@@ -62,6 +64,9 @@ void prepare_repo_settings(struct repository *r)
6264
repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
6365
repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash);
6466
repo_cfg_bool(r, "pack.readreverseindex", &r->settings.pack_read_reverse_index, 1);
67+
repo_cfg_bool(r, "pack.usebitmapboundarytraversal",
68+
&r->settings.pack_use_bitmap_boundary_traversal,
69+
r->settings.pack_use_bitmap_boundary_traversal);
6570

6671
/*
6772
* The GIT_TEST_MULTI_PACK_INDEX variable is special in that

repository.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ struct repo_settings {
3737
int command_requires_full_index;
3838
int sparse_index;
3939
int pack_read_reverse_index;
40+
int pack_use_bitmap_boundary_traversal;
4041

4142
struct fsmonitor_settings *fsmonitor; /* lazily loaded */
4243

t/README

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -442,6 +442,10 @@ GIT_TEST_INDEX_VERSION=<n> exercises the index read/write code path
442442
for the index version specified. Can be set to any valid version
443443
(currently 2, 3, or 4).
444444

445+
GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=<boolean> if enabled will
446+
use the boundary-based bitmap traversal algorithm. See the documentation
447+
of `pack.useBitmapBoundaryTraversal` for more details.
448+
445449
GIT_TEST_PACK_SPARSE=<boolean> if disabled will default the pack-objects
446450
builtin to use the non-sparse object walk. This can still be overridden by
447451
the --sparse command-line argument.

t/t5310-pack-bitmaps.sh

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ test_description='exercise basic bitmap functionality'
99
# their place.
1010
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
1111

12+
# Likewise, allow individual tests to control whether or not they use
13+
# the boundary-based traversal.
14+
sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
15+
1216
objpath () {
1317
echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
1418
}
@@ -457,6 +461,13 @@ test_bitmap_cases () {
457461

458462
test_bitmap_cases
459463

464+
GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1
465+
export GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
466+
467+
test_bitmap_cases
468+
469+
sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
470+
460471
test_expect_success 'incremental repack fails when bitmaps are requested' '
461472
test_commit more-1 &&
462473
test_must_fail git repack -d 2>err &&
@@ -468,6 +479,33 @@ test_expect_success 'incremental repack can disable bitmaps' '
468479
git repack -d --no-write-bitmap-index
469480
'
470481

482+
test_expect_success 'boundary-based traversal is used when requested' '
483+
git repack -a -d --write-bitmap-index &&
484+
485+
for argv in \
486+
"git -c pack.useBitmapBoundaryTraversal=true" \
487+
"git -c feature.experimental=true" \
488+
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 git"
489+
do
490+
eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
491+
--use-bitmap-index second..other 2>perf" &&
492+
grep "\"region_enter\".*\"label\":\"haves/boundary\"" perf ||
493+
return 1
494+
done &&
495+
496+
for argv in \
497+
"git -c pack.useBitmapBoundaryTraversal=false" \
498+
"git -c feature.experimental=true -c pack.useBitmapBoundaryTraversal=false" \
499+
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c pack.useBitmapBoundaryTraversal=true" \
500+
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c feature.experimental=true"
501+
do
502+
eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
503+
--use-bitmap-index second..other 2>perf" &&
504+
grep "\"region_enter\".*\"label\":\"haves/classic\"" perf ||
505+
return 1
506+
done
507+
'
508+
471509
test_bitmap_cases "pack.writeBitmapLookupTable"
472510

473511
test_expect_success 'verify writing bitmap lookup table when enabled' '

0 commit comments

Comments
 (0)