Skip to content

Commit 5f5ccd9

Browse files
ttaylorrgitster
authored andcommitted
midx: implement BTMP chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent fba6818 commit 5f5ccd9

File tree

6 files changed

+226
-4
lines changed

6 files changed

+226
-4
lines changed

Documentation/gitformat-pack.txt

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,15 @@ CHUNK DATA:
396396
is padded at the end with between 0 and 3 NUL bytes to make the
397397
chunk size a multiple of 4 bytes.
398398

399+
Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
400+
Stores a table of two 4-byte unsigned integers in network order.
401+
Each table entry corresponds to a single pack (in the order that
402+
they appear above in the `PNAM` chunk). The values for each table
403+
entry are as follows:
404+
- The first bit position (in pseudo-pack order, see below) to
405+
contain an object from that pack.
406+
- The number of bits whose objects are selected from that pack.
407+
399408
OID Fanout (ID: {'O', 'I', 'D', 'F'})
400409
The ith entry, F[i], stores the number of OIDs with first
401410
byte at most i. Thus F[255] stores the total
@@ -509,6 +518,73 @@ packs arranged in MIDX order (with the preferred pack coming first).
509518
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
510519
the MIDX itself.
511520

521+
=== `BTMP` chunk
522+
523+
The Bitmapped Packfiles (`BTMP`) chunk encodes additional information
524+
about the objects in the multi-pack index's reachability bitmap. Recall
525+
that objects from the MIDX are arranged in "pseudo-pack" order (see
526+
above) for reachability bitmaps.
527+
528+
From the example above, suppose we have packs "a", "b", and "c", with
529+
10, 15, and 20 objects, respectively. In pseudo-pack order, those would
530+
be arranged as follows:
531+
532+
|a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
533+
534+
When working with single-pack bitmaps (or, equivalently, multi-pack
535+
reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1]
536+
performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped
537+
or preferred packfile instead of adding objects to the packing list.
538+
539+
When a chunk of bytes is reused from an existing pack, any objects
540+
contained therein do not need to be added to the packing list, saving
541+
memory and CPU time. But a chunk from an existing packfile can only be
542+
reused when the following conditions are met:
543+
544+
- The chunk contains only objects which were requested by the caller
545+
(i.e. does not contain any objects which the caller didn't ask for
546+
explicitly or implicitly).
547+
548+
- All objects stored in non-thin packs as offset- or reference-deltas
549+
also include their base object in the resulting pack.
550+
551+
The `BTMP` chunk encodes the necessary information in order to implement
552+
multi-pack reuse over a set of packfiles as described above.
553+
Specifically, the `BTMP` chunk encodes three pieces of information (all
554+
32-bit unsigned integers in network byte-order) for each packfile `p`
555+
that is stored in the MIDX, as follows:
556+
557+
`bitmap_pos`:: The first bit position (in pseudo-pack order) in the
558+
multi-pack index's reachability bitmap occupied by an object from `p`.
559+
560+
`bitmap_nr`:: The number of bit positions (including the one at
561+
`bitmap_pos`) that encode objects from that pack `p`.
562+
563+
For example, the `BTMP` chunk corresponding to the above example (with
564+
packs ``a'', ``b'', and ``c'') would look like:
565+
566+
[cols="1,2,2"]
567+
|===
568+
| |`bitmap_pos` |`bitmap_nr`
569+
570+
|packfile ``a''
571+
|`0`
572+
|`10`
573+
574+
|packfile ``b''
575+
|`10`
576+
|`15`
577+
578+
|packfile ``c''
579+
|`25`
580+
|`20`
581+
|===
582+
583+
With this information in place, we can treat each packfile as
584+
individually reusable in the same fashion as verbatim pack reuse is
585+
performed on individual packs prior to the implementation of the `BTMP`
586+
chunk.
587+
512588
== cruft packs
513589

514590
The cruft packs feature offer an alternative to Git's traditional mechanism of

midx.c

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333

3434
#define MIDX_CHUNK_ALIGNMENT 4
3535
#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
36+
#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
3637
#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
3738
#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
3839
#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
@@ -41,6 +42,7 @@
4142
#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
4243
#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
4344
#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
45+
#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
4446
#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
4547

4648
#define PACK_EXPIRED UINT_MAX
@@ -193,6 +195,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
193195

194196
pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
195197
&m->chunk_large_offsets_len);
198+
pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
199+
(const unsigned char **)&m->chunk_bitmapped_packs,
200+
&m->chunk_bitmapped_packs_len);
196201

197202
if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
198203
pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex,
@@ -286,6 +291,26 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
286291
return 0;
287292
}
288293

294+
int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
295+
struct bitmapped_pack *bp, uint32_t pack_int_id)
296+
{
297+
if (!m->chunk_bitmapped_packs)
298+
return error(_("MIDX does not contain the BTMP chunk"));
299+
300+
if (prepare_midx_pack(r, m, pack_int_id))
301+
return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
302+
303+
bp->p = m->packs[pack_int_id];
304+
bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
305+
MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id);
306+
bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
307+
MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id +
308+
sizeof(uint32_t));
309+
bp->pack_int_id = pack_int_id;
310+
311+
return 0;
312+
}
313+
289314
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result)
290315
{
291316
return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
@@ -468,10 +493,16 @@ static size_t write_midx_header(struct hashfile *f,
468493
return MIDX_HEADER_SIZE;
469494
}
470495

496+
#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
497+
471498
struct pack_info {
472499
uint32_t orig_pack_int_id;
473500
char *pack_name;
474501
struct packed_git *p;
502+
503+
uint32_t bitmap_pos;
504+
uint32_t bitmap_nr;
505+
475506
unsigned expired : 1;
476507
};
477508

@@ -484,6 +515,7 @@ static void fill_pack_info(struct pack_info *info,
484515
info->orig_pack_int_id = orig_pack_int_id;
485516
info->pack_name = xstrdup(pack_name);
486517
info->p = p;
518+
info->bitmap_pos = BITMAP_POS_UNKNOWN;
487519
}
488520

489521
static int pack_info_compare(const void *_a, const void *_b)
@@ -824,6 +856,26 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
824856
return 0;
825857
}
826858

859+
static int write_midx_bitmapped_packs(struct hashfile *f, void *data)
860+
{
861+
struct write_midx_context *ctx = data;
862+
size_t i;
863+
864+
for (i = 0; i < ctx->nr; i++) {
865+
struct pack_info *pack = &ctx->info[i];
866+
if (pack->expired)
867+
continue;
868+
869+
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr)
870+
BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)",
871+
pack->pack_name, pack->bitmap_nr);
872+
873+
hashwrite_be32(f, pack->bitmap_pos);
874+
hashwrite_be32(f, pack->bitmap_nr);
875+
}
876+
return 0;
877+
}
878+
827879
static int write_midx_oid_fanout(struct hashfile *f,
828880
void *data)
829881
{
@@ -991,8 +1043,19 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
9911043
QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
9921044

9931045
ALLOC_ARRAY(pack_order, ctx->entries_nr);
994-
for (i = 0; i < ctx->entries_nr; i++)
1046+
for (i = 0; i < ctx->entries_nr; i++) {
1047+
struct pack_midx_entry *e = &ctx->entries[data[i].nr];
1048+
struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
1049+
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
1050+
pack->bitmap_pos = i;
1051+
pack->bitmap_nr++;
9951052
pack_order[i] = data[i].nr;
1053+
}
1054+
for (i = 0; i < ctx->nr; i++) {
1055+
struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
1056+
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
1057+
pack->bitmap_pos = 0;
1058+
}
9961059
free(data);
9971060

9981061
trace2_region_leave("midx", "midx_pack_order", the_repository);
@@ -1293,6 +1356,7 @@ static int write_midx_internal(const char *object_dir,
12931356
struct hashfile *f = NULL;
12941357
struct lock_file lk;
12951358
struct write_midx_context ctx = { 0 };
1359+
int bitmapped_packs_concat_len = 0;
12961360
int pack_name_concat_len = 0;
12971361
int dropped_packs = 0;
12981362
int result = 0;
@@ -1505,8 +1569,10 @@ static int write_midx_internal(const char *object_dir,
15051569
}
15061570

15071571
for (i = 0; i < ctx.nr; i++) {
1508-
if (!ctx.info[i].expired)
1509-
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
1572+
if (ctx.info[i].expired)
1573+
continue;
1574+
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
1575+
bitmapped_packs_concat_len += 2 * sizeof(uint32_t);
15101576
}
15111577

15121578
/* Check that the preferred pack wasn't expired (if given). */
@@ -1566,6 +1632,9 @@ static int write_midx_internal(const char *object_dir,
15661632
add_chunk(cf, MIDX_CHUNKID_REVINDEX,
15671633
st_mult(ctx.entries_nr, sizeof(uint32_t)),
15681634
write_midx_revindex);
1635+
add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
1636+
bitmapped_packs_concat_len,
1637+
write_midx_bitmapped_packs);
15691638
}
15701639

15711640
write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);

midx.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
struct object_id;
88
struct pack_entry;
99
struct repository;
10+
struct bitmapped_pack;
1011

1112
#define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
1213
#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
@@ -33,6 +34,8 @@ struct multi_pack_index {
3334

3435
const unsigned char *chunk_pack_names;
3536
size_t chunk_pack_names_len;
37+
const uint32_t *chunk_bitmapped_packs;
38+
size_t chunk_bitmapped_packs_len;
3639
const uint32_t *chunk_oid_fanout;
3740
const unsigned char *chunk_oid_lookup;
3841
const unsigned char *chunk_object_offsets;
@@ -58,6 +61,8 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m);
5861

5962
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
6063
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
64+
int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
65+
struct bitmapped_pack *bp, uint32_t pack_int_id);
6166
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
6267
off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
6368
uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);

pack-bitmap.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,15 @@ typedef int (*show_reachable_fn)(
5252

5353
struct bitmap_index;
5454

55+
struct bitmapped_pack {
56+
struct packed_git *p;
57+
58+
uint32_t bitmap_pos;
59+
uint32_t bitmap_nr;
60+
61+
uint32_t pack_int_id; /* MIDX only */
62+
};
63+
5564
struct bitmap_index *prepare_bitmap_git(struct repository *r);
5665
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx);
5766
void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,

t/helper/test-read-midx.c

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,16 +100,44 @@ static int read_midx_preferred_pack(const char *object_dir)
100100
return 0;
101101
}
102102

103+
static int read_midx_bitmapped_packs(const char *object_dir)
104+
{
105+
struct multi_pack_index *midx = NULL;
106+
struct bitmapped_pack pack;
107+
uint32_t i;
108+
109+
setup_git_directory();
110+
111+
midx = load_multi_pack_index(object_dir, 1);
112+
if (!midx)
113+
return 1;
114+
115+
for (i = 0; i < midx->num_packs; i++) {
116+
if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0)
117+
return 1;
118+
119+
printf("%s\n", pack_basename(pack.p));
120+
printf(" bitmap_pos: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_pos);
121+
printf(" bitmap_nr: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_nr);
122+
}
123+
124+
close_midx(midx);
125+
126+
return 0;
127+
}
128+
103129
int cmd__read_midx(int argc, const char **argv)
104130
{
105131
if (!(argc == 2 || argc == 3))
106-
usage("read-midx [--show-objects|--checksum|--preferred-pack] <object-dir>");
132+
usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir>");
107133

108134
if (!strcmp(argv[1], "--show-objects"))
109135
return read_midx_file(argv[2], 1);
110136
else if (!strcmp(argv[1], "--checksum"))
111137
return read_midx_checksum(argv[2]);
112138
else if (!strcmp(argv[1], "--preferred-pack"))
113139
return read_midx_preferred_pack(argv[2]);
140+
else if (!strcmp(argv[1], "--bitmap"))
141+
return read_midx_bitmapped_packs(argv[2]);
114142
return read_midx_file(argv[1], 0);
115143
}

t/t5319-multi-pack-index.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1171,4 +1171,39 @@ test_expect_success 'reader notices out-of-bounds fanout' '
11711171
test_cmp expect err
11721172
'
11731173

1174+
test_expect_success 'bitmapped packs are stored via the BTMP chunk' '
1175+
test_when_finished "rm -fr repo" &&
1176+
git init repo &&
1177+
(
1178+
cd repo &&
1179+
1180+
for i in 1 2 3 4 5
1181+
do
1182+
test_commit "$i" &&
1183+
git repack -d || return 1
1184+
done &&
1185+
1186+
find $objdir/pack -type f -name "*.idx" | xargs -n 1 basename |
1187+
sort >packs &&
1188+
1189+
git multi-pack-index write --stdin-packs <packs &&
1190+
test_must_fail test-tool read-midx --bitmap $objdir 2>err &&
1191+
cat >expect <<-\EOF &&
1192+
error: MIDX does not contain the BTMP chunk
1193+
EOF
1194+
test_cmp expect err &&
1195+
1196+
git multi-pack-index write --stdin-packs --bitmap \
1197+
--preferred-pack="$(head -n1 <packs)" <packs &&
1198+
test-tool read-midx --bitmap $objdir >actual &&
1199+
for i in $(test_seq $(wc -l <packs))
1200+
do
1201+
sed -ne "${i}s/\.idx$/\.pack/p" packs &&
1202+
echo " bitmap_pos: $((($i - 1) * 3))" &&
1203+
echo " bitmap_nr: 3" || return 1
1204+
done >expect &&
1205+
test_cmp expect actual
1206+
)
1207+
'
1208+
11741209
test_done

0 commit comments

Comments
 (0)