Skip to content

Commit c5b0a12

Browse files
committed
Merge branch 'tb/incremental-midx-part-2' into seen
Incremental updates of multi-pack index files. * tb/incremental-midx-part-2: fixup! midx: implement writing incremental MIDX bitmaps midx: implement writing incremental MIDX bitmaps pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators pack-bitmap.c: keep track of each layer's type bitmaps ewah: implement `struct ewah_or_iterator` pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs pack-bitmap.c: compute disk-usage with incremental MIDXs pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs pack-bitmap.c: open and store incremental bitmap layers pack-revindex: prepare for incremental MIDX bitmaps Documentation: describe incremental MIDX bitmaps
2 parents 5a0141b + 9d49c75 commit c5b0a12

File tree

10 files changed

+548
-112
lines changed

10 files changed

+548
-112
lines changed

Documentation/technical/multi-pack-index.txt

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
164164
(in the C implementation, this is often computed as `i +
165165
m->num_objects_in_base`).
166166

167+
=== Pseudo-pack order for incremental MIDXs
168+
169+
The original implementation of multi-pack reachability bitmaps defined
170+
the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
171+
titled "multi-pack-index reverse indexes") roughly as follows:
172+
173+
____
174+
In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
175+
objects in packs stored by the MIDX, laid out in pack order, and the
176+
packs arranged in MIDX order (with the preferred pack coming first).
177+
____
178+
179+
In the incremental MIDX design, we extend this definition to include
180+
objects from multiple layers of the MIDX chain. The pseudo-pack order
181+
for incremental MIDXs is determined by concatenating the pseudo-pack
182+
ordering for each layer of the MIDX chain in order. Formally two objects
183+
`o1` and `o2` are compared as follows:
184+
185+
1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
186+
`o1` is considered less than `o2`.
187+
2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
188+
MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
189+
preferred and the other is not, then the preferred one sorts first. If
190+
there is a base layer (i.e. the MIDX layer is not the first layer in
191+
the chain), then if `pack(o1)` appears earlier in that MIDX layer's
192+
pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
193+
appears earlier, than the opposite is true.
194+
3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
195+
same MIDX layer. Sort `o1` and `o2` by their offset within their
196+
containing packfile.
197+
198+
=== Reachability bitmaps and incremental MIDXs
199+
200+
Each layer of an incremental MIDX chain may have its objects (and the
201+
objects from any previous layer in the same MIDX chain) represented in
202+
its own `*.bitmap` file.
203+
204+
The structure of a `*.bitmap` file belonging to an incremental MIDX
205+
chain is identical to that of a non-incremental MIDX bitmap, or a
206+
classic single-pack bitmap. Since objects are added to the end of the
207+
incremental MIDX's pseudo-pack order (see: above), it is possible to
208+
extend a bitmap when appending to the end of a MIDX chain.
209+
210+
(Note: it is possible likewise to compress a contiguous sequence of MIDX
211+
incremental layers, and their `*.bitmap`(s) into a single layer and
212+
`*.bitmap`, but this is not yet implemented.)
213+
214+
The object positions used are global within the pseudo-pack order, so
215+
subsequent layers will have, for example, `m->num_objects_in_base`
216+
number of `0` bits in each of their four type bitmaps. This follows from
217+
the fact that we only write type bitmap entries for objects present in
218+
the layer immediately corresponding to the bitmap).
219+
220+
Note also that only the bitmap pertaining to the most recent layer in an
221+
incremental MIDX chain is used to store reachability information about
222+
the interesting and uninteresting objects in a reachability query.
223+
Earlier bitmap layers are only used to look up commit and pseudo-merge
224+
bitmaps from that layer, as well as the type-level bitmaps for objects
225+
in that layer.
226+
227+
To simplify the implementation, type-level bitmaps are iterated
228+
simultaneously, and their results are OR'd together to avoid recursively
229+
calling internal bitmap functions.
230+
167231
Future Work
168232
-----------
169233

builtin/pack-objects.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1370,7 +1370,8 @@ static void write_pack_file(void)
13701370

13711371
if (write_bitmap_index) {
13721372
bitmap_writer_init(&bitmap_writer,
1373-
the_repository, &to_pack);
1373+
the_repository, &to_pack,
1374+
NULL);
13741375
bitmap_writer_set_checksum(&bitmap_writer, hash);
13751376
bitmap_writer_build_type_index(&bitmap_writer,
13761377
written_list);

ewah/ewah_bitmap.c

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
372372
read_new_rlw(it);
373373
}
374374

375+
void ewah_or_iterator_init(struct ewah_or_iterator *it,
376+
struct ewah_bitmap **parents, size_t nr)
377+
{
378+
size_t i;
379+
380+
memset(it, 0, sizeof(*it));
381+
382+
ALLOC_ARRAY(it->its, nr);
383+
for (i = 0; i < nr; i++)
384+
ewah_iterator_init(&it->its[it->nr++], parents[i]);
385+
}
386+
387+
int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
388+
{
389+
eword_t buf, out = 0;
390+
size_t i;
391+
int ret = 0;
392+
393+
for (i = 0; i < it->nr; i++)
394+
if (ewah_iterator_next(&buf, &it->its[i])) {
395+
out |= buf;
396+
ret = 1;
397+
}
398+
399+
*next = out;
400+
return ret;
401+
}
402+
403+
void ewah_or_iterator_free(struct ewah_or_iterator *it)
404+
{
405+
free(it->its);
406+
}
407+
375408
void ewah_xor(
376409
struct ewah_bitmap *ewah_i,
377410
struct ewah_bitmap *ewah_j,

ewah/ewok.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
148148
*/
149149
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
150150

151+
struct ewah_or_iterator {
152+
struct ewah_iterator *its;
153+
size_t nr;
154+
};
155+
156+
void ewah_or_iterator_init(struct ewah_or_iterator *it,
157+
struct ewah_bitmap **parents, size_t nr);
158+
159+
int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
160+
161+
void ewah_or_iterator_free(struct ewah_or_iterator *it);
162+
151163
void ewah_xor(
152164
struct ewah_bitmap *ewah_i,
153165
struct ewah_bitmap *ewah_j,

midx-write.c

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -826,20 +826,26 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
826826
return cb.commits;
827827
}
828828

829-
static int write_midx_bitmap(const char *midx_name,
829+
static int write_midx_bitmap(struct write_midx_context *ctx,
830+
const char *object_dir,
830831
const unsigned char *midx_hash,
831832
struct packing_data *pdata,
832833
struct commit **commits,
833834
uint32_t commits_nr,
834-
uint32_t *pack_order,
835835
unsigned flags)
836836
{
837837
int ret, i;
838838
uint16_t options = 0;
839839
struct bitmap_writer writer;
840840
struct pack_idx_entry **index;
841-
char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
842-
hash_to_hex(midx_hash));
841+
struct strbuf bitmap_name = STRBUF_INIT;
842+
843+
if (ctx->incremental)
844+
get_split_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
845+
MIDX_EXT_BITMAP);
846+
else
847+
get_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
848+
MIDX_EXT_BITMAP);
843849

844850
trace2_region_enter("midx", "write_midx_bitmap", the_repository);
845851

@@ -858,7 +864,8 @@ static int write_midx_bitmap(const char *midx_name,
858864
for (i = 0; i < pdata->nr_objects; i++)
859865
index[i] = &pdata->objects[i].idx;
860866

861-
bitmap_writer_init(&writer, the_repository, pdata);
867+
bitmap_writer_init(&writer, the_repository, pdata,
868+
ctx->incremental ? ctx->base_midx : NULL);
862869
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
863870
bitmap_writer_build_type_index(&writer, index);
864871

@@ -876,19 +883,19 @@ static int write_midx_bitmap(const char *midx_name,
876883
* bitmap_writer_finish().
877884
*/
878885
for (i = 0; i < pdata->nr_objects; i++)
879-
index[pack_order[i]] = &pdata->objects[i].idx;
886+
index[ctx->pack_order[i]] = &pdata->objects[i].idx;
880887

881888
bitmap_writer_select_commits(&writer, commits, commits_nr);
882889
ret = bitmap_writer_build(&writer);
883890
if (ret < 0)
884891
goto cleanup;
885892

886893
bitmap_writer_set_checksum(&writer, midx_hash);
887-
bitmap_writer_finish(&writer, index, bitmap_name, options);
894+
bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
888895

889896
cleanup:
890897
free(index);
891-
free(bitmap_name);
898+
strbuf_release(&bitmap_name);
892899
bitmap_writer_free(&writer);
893900

894901
trace2_region_leave("midx", "write_midx_bitmap", the_repository);
@@ -1072,8 +1079,6 @@ static int write_midx_internal(const char *object_dir,
10721079
trace2_region_enter("midx", "write_midx_internal", the_repository);
10731080

10741081
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
1075-
if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
1076-
die(_("cannot write incremental MIDX with bitmap"));
10771082

10781083
if (ctx.incremental)
10791084
strbuf_addf(&midx_name,
@@ -1115,6 +1120,12 @@ static int write_midx_internal(const char *object_dir,
11151120
if (ctx.incremental) {
11161121
struct multi_pack_index *m = ctx.base_midx;
11171122
while (m) {
1123+
if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
1124+
error(_("could not load reverse index for MIDX %s"),
1125+
hash_to_hex(get_midx_checksum(m)));
1126+
result = 1;
1127+
goto cleanup;
1128+
}
11181129
ctx.num_multi_pack_indexes_before++;
11191130
m = m->base_midx;
11201131
}
@@ -1404,8 +1415,8 @@ static int write_midx_internal(const char *object_dir,
14041415
FREE_AND_NULL(ctx.entries);
14051416
ctx.entries_nr = 0;
14061417

1407-
if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
1408-
commits, commits_nr, ctx.pack_order,
1418+
if (write_midx_bitmap(&ctx, object_dir,
1419+
midx_hash, &pdata, commits, commits_nr,
14091420
flags) < 0) {
14101421
error(_("could not write multi-pack bitmap"));
14111422
result = 1;

pack-bitmap-write.c

Lines changed: 49 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
#include "alloc.h"
2626
#include "refs.h"
2727
#include "strmap.h"
28+
#include "midx.h"
29+
#include "pack-revindex.h"
2830

2931
struct bitmapped_commit {
3032
struct commit *commit;
@@ -42,14 +44,16 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
4244
}
4345

4446
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
45-
struct packing_data *pdata)
47+
struct packing_data *pdata,
48+
struct multi_pack_index *midx)
4649
{
4750
memset(writer, 0, sizeof(struct bitmap_writer));
4851
if (writer->bitmaps)
4952
BUG("bitmap writer already initialized");
5053
writer->bitmaps = kh_init_oid_map();
5154
writer->pseudo_merge_commits = kh_init_oid_map();
5255
writer->to_pack = pdata;
56+
writer->midx = midx;
5357

5458
string_list_init_dup(&writer->pseudo_merge_groups);
5559

@@ -104,6 +108,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
104108
struct pack_idx_entry **index)
105109
{
106110
uint32_t i;
111+
uint32_t base_objects = 0;
112+
113+
if (writer->midx)
114+
base_objects = writer->midx->num_objects +
115+
writer->midx->num_objects_in_base;
107116

108117
writer->commits = ewah_new();
109118
writer->trees = ewah_new();
@@ -133,19 +142,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
133142

134143
switch (real_type) {
135144
case OBJ_COMMIT:
136-
ewah_set(writer->commits, i);
145+
ewah_set(writer->commits, i + base_objects);
137146
break;
138147

139148
case OBJ_TREE:
140-
ewah_set(writer->trees, i);
149+
ewah_set(writer->trees, i + base_objects);
141150
break;
142151

143152
case OBJ_BLOB:
144-
ewah_set(writer->blobs, i);
153+
ewah_set(writer->blobs, i + base_objects);
145154
break;
146155

147156
case OBJ_TAG:
148-
ewah_set(writer->tags, i);
157+
ewah_set(writer->tags, i + base_objects);
149158
break;
150159

151160
default:
@@ -198,19 +207,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
198207
static uint32_t find_object_pos(struct bitmap_writer *writer,
199208
const struct object_id *oid, int *found)
200209
{
201-
struct object_entry *entry = packlist_find(writer->to_pack, oid);
210+
struct object_entry *entry;
211+
212+
entry = packlist_find(writer->to_pack, oid);
213+
if (entry) {
214+
uint32_t base_objects = 0;
215+
if (writer->midx)
216+
base_objects = writer->midx->num_objects +
217+
writer->midx->num_objects_in_base;
202218

203-
if (!entry) {
204219
if (found)
205-
*found = 0;
206-
warning("Failed to write bitmap index. Packfile doesn't have full closure "
207-
"(object %s is missing)", oid_to_hex(oid));
208-
return 0;
220+
*found = 1;
221+
return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
222+
} else if (writer->midx) {
223+
uint32_t at, pos;
224+
225+
if (!bsearch_midx(oid, writer->midx, &at))
226+
goto missing;
227+
if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
228+
goto missing;
229+
230+
if (found)
231+
*found = 1;
232+
return pos;
209233
}
210234

235+
missing:
211236
if (found)
212-
*found = 1;
213-
return oe_in_pack_pos(writer->to_pack, entry);
237+
*found = 0;
238+
warning("Failed to write bitmap index. Packfile doesn't have full closure "
239+
"(object %s is missing)", oid_to_hex(oid));
240+
return 0;
214241
}
215242

216243
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -577,7 +604,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
577604
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
578605
struct prio_queue tree_queue = { NULL };
579606
struct bitmap_index *old_bitmap;
580-
uint32_t *mapping;
607+
uint32_t *mapping = NULL;
581608
int closed = 1; /* until proven otherwise */
582609

583610
if (writer->show_progress)
@@ -1009,7 +1036,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
10091036
struct strbuf tmp_file = STRBUF_INIT;
10101037
struct hashfile *f;
10111038
off_t *offsets = NULL;
1012-
uint32_t i;
1039+
uint32_t i, base_objects;
10131040

10141041
struct bitmap_disk_header header;
10151042

@@ -1035,6 +1062,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
10351062
if (options & BITMAP_OPT_LOOKUP_TABLE)
10361063
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
10371064

1065+
if (writer->midx)
1066+
base_objects = writer->midx->num_objects +
1067+
writer->midx->num_objects_in_base;
1068+
else
1069+
base_objects = 0;
1070+
10381071
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
10391072
struct bitmapped_commit *stored = &writer->selected[i];
10401073
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1043,7 +1076,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
10431076

10441077
if (commit_pos < 0)
10451078
BUG(_("trying to write commit not in index"));
1046-
stored->commit_pos = commit_pos;
1079+
stored->commit_pos = commit_pos + base_objects;
10471080
}
10481081

10491082
write_selected_commits_v1(writer, f, offsets);

0 commit comments

Comments
 (0)