Skip to content

Commit 0fea6b7

Browse files
committed
Merge branch 'tb/multi-pack-verbatim-reuse'
Streaming spans of packfile data used to be done only from a single, primary, pack in a repository with multiple packfiles. It has been extended to allow reuse from other packfiles, too. * tb/multi-pack-verbatim-reuse: (26 commits) t/perf: add performance tests for multi-pack reuse pack-bitmap: enable reuse from all bitmapped packs pack-objects: allow setting `pack.allowPackReuse` to "single" t/test-lib-functions.sh: implement `test_trace2_data` helper pack-objects: add tracing for various packfile metrics pack-bitmap: prepare to mark objects from multiple packs for reuse pack-revindex: implement `midx_pair_to_pack_pos()` pack-revindex: factor out `midx_key_to_pack_pos()` helper midx: implement `midx_preferred_pack()` git-compat-util.h: implement checked size_t to uint32_t conversion pack-objects: include number of packs reused in output pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse pack-objects: prepare `write_reused_pack()` for multi-pack reuse pack-objects: pass `bitmapped_pack`'s to pack-reuse functions pack-objects: keep track of `pack_start` for each reuse pack pack-objects: parameterize pack-reuse routines over a single pack pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature ewah: implement `bitmap_is_empty()` pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions ...
2 parents 0ebbaa0 + ba47d88 commit 0fea6b7

21 files changed

+1033
-192
lines changed

Documentation/config/pack.txt

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,17 @@ all existing objects. You can force recompression by passing the -F option
2828
to linkgit:git-repack[1].
2929

3030
pack.allowPackReuse::
31-
When true, and when reachability bitmaps are enabled,
32-
pack-objects will try to send parts of the bitmapped packfile
33-
verbatim. This can reduce memory and CPU usage to serve fetches,
34-
but might result in sending a slightly larger pack. Defaults to
35-
true.
31+
When true or "single", and when reachability bitmaps are
32+
enabled, pack-objects will try to send parts of the bitmapped
33+
packfile verbatim. When "multi", and when a multi-pack
34+
reachability bitmap is available, pack-objects will try to send
35+
parts of all packs in the MIDX.
36+
+
37+
If only a single pack bitmap is available, and
38+
`pack.allowPackReuse` is set to "multi", reuse parts of just the
39+
bitmapped packfile. This can reduce memory and CPU usage to
40+
serve fetches, but might result in sending a slightly larger
41+
pack. Defaults to true.
3642

3743
pack.island::
3844
An extended regular expression configuring a set of delta

Documentation/gitformat-pack.txt

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,15 @@ CHUNK DATA:
396396
is padded at the end with between 0 and 3 NUL bytes to make the
397397
chunk size a multiple of 4 bytes.
398398

399+
Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
400+
Stores a table of two 4-byte unsigned integers in network order.
401+
Each table entry corresponds to a single pack (in the order that
402+
they appear above in the `PNAM` chunk). The values for each table
403+
entry are as follows:
404+
- The first bit position (in pseudo-pack order, see below) to
405+
contain an object from that pack.
406+
- The number of bits whose objects are selected from that pack.
407+
399408
OID Fanout (ID: {'O', 'I', 'D', 'F'})
400409
The ith entry, F[i], stores the number of OIDs with first
401410
byte at most i. Thus F[255] stores the total
@@ -509,6 +518,73 @@ packs arranged in MIDX order (with the preferred pack coming first).
509518
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
510519
the MIDX itself.
511520

521+
=== `BTMP` chunk
522+
523+
The Bitmapped Packfiles (`BTMP`) chunk encodes additional information
524+
about the objects in the multi-pack index's reachability bitmap. Recall
525+
that objects from the MIDX are arranged in "pseudo-pack" order (see
526+
above) for reachability bitmaps.
527+
528+
From the example above, suppose we have packs "a", "b", and "c", with
529+
10, 15, and 20 objects, respectively. In pseudo-pack order, those would
530+
be arranged as follows:
531+
532+
|a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
533+
534+
When working with single-pack bitmaps (or, equivalently, multi-pack
535+
reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1]
536+
performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped
537+
or preferred packfile instead of adding objects to the packing list.
538+
539+
When a chunk of bytes is reused from an existing pack, any objects
540+
contained therein do not need to be added to the packing list, saving
541+
memory and CPU time. But a chunk from an existing packfile can only be
542+
reused when the following conditions are met:
543+
544+
- The chunk contains only objects which were requested by the caller
545+
(i.e. does not contain any objects which the caller didn't ask for
546+
explicitly or implicitly).
547+
548+
- All objects stored in non-thin packs as offset- or reference-deltas
549+
also include their base object in the resulting pack.
550+
551+
The `BTMP` chunk encodes the necessary information in order to implement
552+
multi-pack reuse over a set of packfiles as described above.
553+
Specifically, the `BTMP` chunk encodes three pieces of information (all
554+
32-bit unsigned integers in network byte-order) for each packfile `p`
555+
that is stored in the MIDX, as follows:
556+
557+
`bitmap_pos`:: The first bit position (in pseudo-pack order) in the
558+
multi-pack index's reachability bitmap occupied by an object from `p`.
559+
560+
`bitmap_nr`:: The number of bit positions (including the one at
561+
`bitmap_pos`) that encode objects from that pack `p`.
562+
563+
For example, the `BTMP` chunk corresponding to the above example (with
564+
packs ``a'', ``b'', and ``c'') would look like:
565+
566+
[cols="1,2,2"]
567+
|===
568+
| |`bitmap_pos` |`bitmap_nr`
569+
570+
|packfile ``a''
571+
|`0`
572+
|`10`
573+
574+
|packfile ``b''
575+
|`10`
576+
|`15`
577+
578+
|packfile ``c''
579+
|`25`
580+
|`20`
581+
|===
582+
583+
With this information in place, we can treat each packfile as
584+
individually reusable in the same fashion as verbatim pack reuse is
585+
performed on individual packs prior to the implementation of the `BTMP`
586+
chunk.
587+
512588
== cruft packs
513589

514590
The cruft packs feature offer an alternative to Git's traditional mechanism of

builtin/pack-objects.c

Lines changed: 134 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -218,13 +218,19 @@ static int thin;
218218
static int num_preferred_base;
219219
static struct progress *progress_state;
220220

221-
static struct packed_git *reuse_packfile;
221+
static struct bitmapped_pack *reuse_packfiles;
222+
static size_t reuse_packfiles_nr;
223+
static size_t reuse_packfiles_used_nr;
222224
static uint32_t reuse_packfile_objects;
223225
static struct bitmap *reuse_packfile_bitmap;
224226

225227
static int use_bitmap_index_default = 1;
226228
static int use_bitmap_index = -1;
227-
static int allow_pack_reuse = 1;
229+
static enum {
230+
NO_PACK_REUSE = 0,
231+
SINGLE_PACK_REUSE,
232+
MULTI_PACK_REUSE,
233+
} allow_pack_reuse = SINGLE_PACK_REUSE;
228234
static enum {
229235
WRITE_BITMAP_FALSE = 0,
230236
WRITE_BITMAP_QUIET,
@@ -1010,7 +1016,9 @@ static off_t find_reused_offset(off_t where)
10101016
return reused_chunks[lo-1].difference;
10111017
}
10121018

1013-
static void write_reused_pack_one(size_t pos, struct hashfile *out,
1019+
static void write_reused_pack_one(struct packed_git *reuse_packfile,
1020+
size_t pos, struct hashfile *out,
1021+
off_t pack_start,
10141022
struct pack_window **w_curs)
10151023
{
10161024
off_t offset, next, cur;
@@ -1020,7 +1028,8 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out,
10201028
offset = pack_pos_to_offset(reuse_packfile, pos);
10211029
next = pack_pos_to_offset(reuse_packfile, pos + 1);
10221030

1023-
record_reused_object(offset, offset - hashfile_total(out));
1031+
record_reused_object(offset,
1032+
offset - (hashfile_total(out) - pack_start));
10241033

10251034
cur = offset;
10261035
type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
@@ -1088,41 +1097,93 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out,
10881097
copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset);
10891098
}
10901099

1091-
static size_t write_reused_pack_verbatim(struct hashfile *out,
1100+
static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
1101+
struct hashfile *out,
1102+
off_t pack_start,
10921103
struct pack_window **w_curs)
10931104
{
1094-
size_t pos = 0;
1105+
size_t pos = reuse_packfile->bitmap_pos;
1106+
size_t end;
1107+
1108+
if (pos % BITS_IN_EWORD) {
1109+
size_t word_pos = (pos / BITS_IN_EWORD);
1110+
size_t offset = pos % BITS_IN_EWORD;
1111+
size_t last;
1112+
eword_t word = reuse_packfile_bitmap->words[word_pos];
1113+
1114+
if (offset + reuse_packfile->bitmap_nr < BITS_IN_EWORD)
1115+
last = offset + reuse_packfile->bitmap_nr;
1116+
else
1117+
last = BITS_IN_EWORD;
1118+
1119+
for (; offset < last; offset++) {
1120+
if (word >> offset == 0)
1121+
return word_pos;
1122+
if (!bitmap_get(reuse_packfile_bitmap,
1123+
word_pos * BITS_IN_EWORD + offset))
1124+
return word_pos;
1125+
}
10951126

1096-
while (pos < reuse_packfile_bitmap->word_alloc &&
1097-
reuse_packfile_bitmap->words[pos] == (eword_t)~0)
1098-
pos++;
1127+
pos += BITS_IN_EWORD - (pos % BITS_IN_EWORD);
1128+
}
1129+
1130+
/*
1131+
* Now we're going to copy as many whole eword_t's as possible.
1132+
* "end" is the index of the last whole eword_t we copy, but
1133+
* there may be additional bits to process. Those are handled
1134+
* individually by write_reused_pack().
1135+
*
1136+
* Begin by advancing to the first word boundary in range of the
1137+
* bit positions occupied by objects in "reuse_packfile". Then
1138+
* pick the last word boundary in the same range. If we have at
1139+
* least one word's worth of bits to process, continue on.
1140+
*/
1141+
end = reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr;
1142+
if (end % BITS_IN_EWORD)
1143+
end -= end % BITS_IN_EWORD;
1144+
if (pos >= end)
1145+
return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
10991146

1100-
if (pos) {
1101-
off_t to_write;
1147+
while (pos < end &&
1148+
reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
1149+
pos += BITS_IN_EWORD;
11021150

1103-
written = (pos * BITS_IN_EWORD);
1104-
to_write = pack_pos_to_offset(reuse_packfile, written)
1105-
- sizeof(struct pack_header);
1151+
if (pos > end)
1152+
pos = end;
1153+
1154+
if (reuse_packfile->bitmap_pos < pos) {
1155+
off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0);
1156+
off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p,
1157+
pos - reuse_packfile->bitmap_pos);
1158+
1159+
written += pos - reuse_packfile->bitmap_pos;
11061160

11071161
/* We're recording one chunk, not one object. */
1108-
record_reused_object(sizeof(struct pack_header), 0);
1162+
record_reused_object(pack_start_off,
1163+
pack_start_off - (hashfile_total(out) - pack_start));
11091164
hashflush(out);
1110-
copy_pack_data(out, reuse_packfile, w_curs,
1111-
sizeof(struct pack_header), to_write);
1165+
copy_pack_data(out, reuse_packfile->p, w_curs,
1166+
pack_start_off, pack_end_off - pack_start_off);
11121167

11131168
display_progress(progress_state, written);
11141169
}
1115-
return pos;
1170+
if (pos % BITS_IN_EWORD)
1171+
BUG("attempted to jump past a word boundary to %"PRIuMAX,
1172+
(uintmax_t)pos);
1173+
return pos / BITS_IN_EWORD;
11161174
}
11171175

1118-
static void write_reused_pack(struct hashfile *f)
1176+
static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
1177+
struct hashfile *f)
11191178
{
1120-
size_t i = 0;
1179+
size_t i = reuse_packfile->bitmap_pos / BITS_IN_EWORD;
11211180
uint32_t offset;
1181+
off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);
11221182
struct pack_window *w_curs = NULL;
11231183

11241184
if (allow_ofs_delta)
1125-
i = write_reused_pack_verbatim(f, &w_curs);
1185+
i = write_reused_pack_verbatim(reuse_packfile, f, pack_start,
1186+
&w_curs);
11261187

11271188
for (; i < reuse_packfile_bitmap->word_alloc; ++i) {
11281189
eword_t word = reuse_packfile_bitmap->words[i];
@@ -1133,16 +1194,23 @@ static void write_reused_pack(struct hashfile *f)
11331194
break;
11341195

11351196
offset += ewah_bit_ctz64(word >> offset);
1197+
if (pos + offset < reuse_packfile->bitmap_pos)
1198+
continue;
1199+
if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr)
1200+
goto done;
11361201
/*
11371202
* Can use bit positions directly, even for MIDX
11381203
* bitmaps. See comment in try_partial_reuse()
11391204
* for why.
11401205
*/
1141-
write_reused_pack_one(pos + offset, f, &w_curs);
1206+
write_reused_pack_one(reuse_packfile->p,
1207+
pos + offset - reuse_packfile->bitmap_pos,
1208+
f, pack_start, &w_curs);
11421209
display_progress(progress_state, ++written);
11431210
}
11441211
}
11451212

1213+
done:
11461214
unuse_pack(&w_curs);
11471215
}
11481216

@@ -1194,9 +1262,14 @@ static void write_pack_file(void)
11941262

11951263
offset = write_pack_header(f, nr_remaining);
11961264

1197-
if (reuse_packfile) {
1265+
if (reuse_packfiles_nr) {
11981266
assert(pack_to_stdout);
1199-
write_reused_pack(f);
1267+
for (j = 0; j < reuse_packfiles_nr; j++) {
1268+
reused_chunks_nr = 0;
1269+
write_reused_pack(&reuse_packfiles[j], f);
1270+
if (reused_chunks_nr)
1271+
reuse_packfiles_used_nr++;
1272+
}
12001273
offset = hashfile_total(f);
12011274
}
12021275

@@ -3172,7 +3245,19 @@ static int git_pack_config(const char *k, const char *v,
31723245
return 0;
31733246
}
31743247
if (!strcmp(k, "pack.allowpackreuse")) {
3175-
allow_pack_reuse = git_config_bool(k, v);
3248+
int res = git_parse_maybe_bool_text(v);
3249+
if (res < 0) {
3250+
if (!strcasecmp(v, "single"))
3251+
allow_pack_reuse = SINGLE_PACK_REUSE;
3252+
else if (!strcasecmp(v, "multi"))
3253+
allow_pack_reuse = MULTI_PACK_REUSE;
3254+
else
3255+
die(_("invalid pack.allowPackReuse value: '%s'"), v);
3256+
} else if (res) {
3257+
allow_pack_reuse = SINGLE_PACK_REUSE;
3258+
} else {
3259+
allow_pack_reuse = NO_PACK_REUSE;
3260+
}
31763261
return 0;
31773262
}
31783263
if (!strcmp(k, "pack.threads")) {
@@ -3931,7 +4016,7 @@ static void loosen_unused_packed_objects(void)
39314016
*/
39324017
static int pack_options_allow_reuse(void)
39334018
{
3934-
return allow_pack_reuse &&
4019+
return allow_pack_reuse != NO_PACK_REUSE &&
39354020
pack_to_stdout &&
39364021
!ignore_packed_keep_on_disk &&
39374022
!ignore_packed_keep_in_core &&
@@ -3944,13 +4029,18 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
39444029
if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
39454030
return -1;
39464031

3947-
if (pack_options_allow_reuse() &&
3948-
!reuse_partial_packfile_from_bitmap(
3949-
bitmap_git,
3950-
&reuse_packfile,
3951-
&reuse_packfile_objects,
3952-
&reuse_packfile_bitmap)) {
3953-
assert(reuse_packfile_objects);
4032+
if (pack_options_allow_reuse())
4033+
reuse_partial_packfile_from_bitmap(bitmap_git,
4034+
&reuse_packfiles,
4035+
&reuse_packfiles_nr,
4036+
&reuse_packfile_bitmap,
4037+
allow_pack_reuse == MULTI_PACK_REUSE);
4038+
4039+
if (reuse_packfiles) {
4040+
reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
4041+
if (!reuse_packfile_objects)
4042+
BUG("expected non-empty reuse bitmap");
4043+
39544044
nr_result += reuse_packfile_objects;
39554045
nr_seen += reuse_packfile_objects;
39564046
display_progress(progress_state, nr_seen);
@@ -4518,11 +4608,20 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
45184608
fprintf_ln(stderr,
45194609
_("Total %"PRIu32" (delta %"PRIu32"),"
45204610
" reused %"PRIu32" (delta %"PRIu32"),"
4521-
" pack-reused %"PRIu32),
4611+
" pack-reused %"PRIu32" (from %"PRIuMAX")"),
45224612
written, written_delta, reused, reused_delta,
4523-
reuse_packfile_objects);
4613+
reuse_packfile_objects,
4614+
(uintmax_t)reuse_packfiles_used_nr);
4615+
4616+
trace2_data_intmax("pack-objects", the_repository, "written", written);
4617+
trace2_data_intmax("pack-objects", the_repository, "written/delta", written_delta);
4618+
trace2_data_intmax("pack-objects", the_repository, "reused", reused);
4619+
trace2_data_intmax("pack-objects", the_repository, "reused/delta", reused_delta);
4620+
trace2_data_intmax("pack-objects", the_repository, "pack-reused", reuse_packfile_objects);
4621+
trace2_data_intmax("pack-objects", the_repository, "packs-reused", reuse_packfiles_used_nr);
45244622

45254623
cleanup:
4624+
clear_packing_data(&to_pack);
45264625
list_objects_filter_release(&filter_options);
45274626
strvec_clear(&rp);
45284627

0 commit comments

Comments
 (0)