Skip to content

Commit 4a9c581

Browse files
peffgitster
authored andcommitted
pack-bitmap-write: reimplement bitmap writing
The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent ccae08e commit 4a9c581

File tree

1 file changed

+161
-123
lines changed

1 file changed

+161
-123
lines changed

pack-bitmap-write.c

Lines changed: 161 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,6 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
110110
/**
111111
* Compute the actual bitmaps
112112
*/
113-
static struct object **seen_objects;
114-
static unsigned int seen_objects_nr, seen_objects_alloc;
115113

116114
static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused)
117115
{
@@ -127,21 +125,6 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm
127125
writer.selected_nr++;
128126
}
129127

130-
static inline void mark_as_seen(struct object *object)
131-
{
132-
ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc);
133-
seen_objects[seen_objects_nr++] = object;
134-
}
135-
136-
static inline void reset_all_seen(void)
137-
{
138-
unsigned int i;
139-
for (i = 0; i < seen_objects_nr; ++i) {
140-
seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN);
141-
}
142-
seen_objects_nr = 0;
143-
}
144-
145128
static uint32_t find_object_pos(const struct object_id *oid)
146129
{
147130
struct object_entry *entry = packlist_find(writer.to_pack, oid);
@@ -154,60 +137,6 @@ static uint32_t find_object_pos(const struct object_id *oid)
154137
return oe_in_pack_pos(writer.to_pack, entry);
155138
}
156139

157-
static void show_object(struct object *object, const char *name, void *data)
158-
{
159-
struct bitmap *base = data;
160-
bitmap_set(base, find_object_pos(&object->oid));
161-
mark_as_seen(object);
162-
}
163-
164-
static void show_commit(struct commit *commit, void *data)
165-
{
166-
mark_as_seen((struct object *)commit);
167-
}
168-
169-
static int
170-
add_to_include_set(struct bitmap *base, struct commit *commit)
171-
{
172-
khiter_t hash_pos;
173-
uint32_t bitmap_pos = find_object_pos(&commit->object.oid);
174-
175-
if (bitmap_get(base, bitmap_pos))
176-
return 0;
177-
178-
hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
179-
if (hash_pos < kh_end(writer.bitmaps)) {
180-
struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos);
181-
bitmap_or_ewah(base, bc->bitmap);
182-
return 0;
183-
}
184-
185-
bitmap_set(base, bitmap_pos);
186-
return 1;
187-
}
188-
189-
static int
190-
should_include(struct commit *commit, void *_data)
191-
{
192-
struct bitmap *base = _data;
193-
194-
if (!add_to_include_set(base, commit)) {
195-
struct commit_list *parent = commit->parents;
196-
197-
mark_as_seen((struct object *)commit);
198-
199-
while (parent) {
200-
parent->item->object.flags |= SEEN;
201-
mark_as_seen((struct object *)parent->item);
202-
parent = parent->next;
203-
}
204-
205-
return 0;
206-
}
207-
208-
return 1;
209-
}
210-
211140
static void compute_xor_offsets(void)
212141
{
213142
static const int MAX_XOR_OFFSET_SEARCH = 10;
@@ -248,79 +177,188 @@ static void compute_xor_offsets(void)
248177
}
249178
}
250179

251-
void bitmap_writer_build(struct packing_data *to_pack)
252-
{
253-
static const double REUSE_BITMAP_THRESHOLD = 0.2;
180+
struct bb_commit {
181+
struct commit_list *children;
182+
struct bitmap *bitmap;
183+
unsigned selected:1;
184+
unsigned idx; /* within selected array */
185+
};
254186

255-
int i, reuse_after, need_reset;
256-
struct bitmap *base = bitmap_new();
257-
struct rev_info revs;
187+
define_commit_slab(bb_data, struct bb_commit);
258188

259-
writer.bitmaps = kh_init_oid_map();
260-
writer.to_pack = to_pack;
189+
struct bitmap_builder {
190+
struct bb_data data;
191+
struct commit **commits;
192+
size_t commits_nr, commits_alloc;
193+
};
261194

262-
if (writer.show_progress)
263-
writer.progress = start_progress("Building bitmaps", writer.selected_nr);
195+
static void bitmap_builder_init(struct bitmap_builder *bb,
196+
struct bitmap_writer *writer)
197+
{
198+
struct rev_info revs;
199+
struct commit *commit;
200+
unsigned int i;
264201

265-
repo_init_revisions(to_pack->repo, &revs, NULL);
266-
revs.tag_objects = 1;
267-
revs.tree_objects = 1;
268-
revs.blob_objects = 1;
269-
revs.no_walk = 0;
202+
memset(bb, 0, sizeof(*bb));
203+
init_bb_data(&bb->data);
270204

271-
revs.include_check = should_include;
272205
reset_revision_walk();
206+
repo_init_revisions(writer->to_pack->repo, &revs, NULL);
207+
revs.topo_order = 1;
208+
209+
for (i = 0; i < writer->selected_nr; i++) {
210+
struct commit *c = writer->selected[i].commit;
211+
struct bb_commit *ent = bb_data_at(&bb->data, c);
212+
ent->selected = 1;
213+
ent->idx = i;
214+
add_pending_object(&revs, &c->object, "");
215+
}
273216

274-
reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD;
275-
need_reset = 0;
217+
if (prepare_revision_walk(&revs))
218+
die("revision walk setup failed");
276219

277-
for (i = writer.selected_nr - 1; i >= 0; --i) {
278-
struct bitmapped_commit *stored;
279-
struct object *object;
220+
while ((commit = get_revision(&revs))) {
221+
struct commit_list *p;
280222

281-
khiter_t hash_pos;
282-
int hash_ret;
223+
parse_commit_or_die(commit);
283224

284-
stored = &writer.selected[i];
285-
object = (struct object *)stored->commit;
225+
ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
226+
bb->commits[bb->commits_nr++] = commit;
286227

287-
if (stored->bitmap == NULL) {
288-
if (i < writer.selected_nr - 1 &&
289-
(need_reset ||
290-
!in_merge_bases(writer.selected[i + 1].commit,
291-
stored->commit))) {
292-
bitmap_reset(base);
293-
reset_all_seen();
294-
}
228+
for (p = commit->parents; p; p = p->next) {
229+
struct bb_commit *ent = bb_data_at(&bb->data, p->item);
230+
commit_list_insert(commit, &ent->children);
231+
}
232+
}
233+
}
295234

296-
add_pending_object(&revs, object, "");
297-
revs.include_check_data = base;
235+
static void bitmap_builder_clear(struct bitmap_builder *bb)
236+
{
237+
clear_bb_data(&bb->data);
238+
free(bb->commits);
239+
bb->commits_nr = bb->commits_alloc = 0;
240+
}
298241

299-
if (prepare_revision_walk(&revs))
300-
die("revision walk setup failed");
242+
static void fill_bitmap_tree(struct bitmap *bitmap,
243+
struct tree *tree)
244+
{
245+
uint32_t pos;
246+
struct tree_desc desc;
247+
struct name_entry entry;
301248

302-
traverse_commit_list(&revs, show_commit, show_object, base);
249+
/*
250+
* If our bit is already set, then there is nothing to do. Both this
251+
* tree and all of its children will be set.
252+
*/
253+
pos = find_object_pos(&tree->object.oid);
254+
if (bitmap_get(bitmap, pos))
255+
return;
256+
bitmap_set(bitmap, pos);
303257

304-
object_array_clear(&revs.pending);
258+
if (parse_tree(tree) < 0)
259+
die("unable to load tree object %s",
260+
oid_to_hex(&tree->object.oid));
261+
init_tree_desc(&desc, tree->buffer, tree->size);
305262

306-
stored->bitmap = bitmap_to_ewah(base);
307-
need_reset = 0;
308-
} else
309-
need_reset = 1;
263+
while (tree_entry(&desc, &entry)) {
264+
switch (object_type(entry.mode)) {
265+
case OBJ_TREE:
266+
fill_bitmap_tree(bitmap,
267+
lookup_tree(the_repository, &entry.oid));
268+
break;
269+
case OBJ_BLOB:
270+
bitmap_set(bitmap, find_object_pos(&entry.oid));
271+
break;
272+
default:
273+
/* Gitlink, etc; not reachable */
274+
break;
275+
}
276+
}
277+
278+
free_tree_buffer(tree);
279+
}
310280

311-
if (i >= reuse_after)
312-
stored->flags |= BITMAP_FLAG_REUSE;
281+
static void fill_bitmap_commit(struct bb_commit *ent,
282+
struct commit *commit)
283+
{
284+
if (!ent->bitmap)
285+
ent->bitmap = bitmap_new();
313286

314-
hash_pos = kh_put_oid_map(writer.bitmaps, object->oid, &hash_ret);
315-
if (hash_ret == 0)
316-
die("Duplicate entry when writing index: %s",
317-
oid_to_hex(&object->oid));
287+
/*
288+
* mark ourselves, but do not bother with parents; their values
289+
* will already have been propagated to us
290+
*/
291+
bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid));
292+
fill_bitmap_tree(ent->bitmap, get_commit_tree(commit));
293+
}
318294

319-
kh_value(writer.bitmaps, hash_pos) = stored;
320-
display_progress(writer.progress, writer.selected_nr - i);
295+
static void store_selected(struct bb_commit *ent, struct commit *commit)
296+
{
297+
struct bitmapped_commit *stored = &writer.selected[ent->idx];
298+
khiter_t hash_pos;
299+
int hash_ret;
300+
301+
/*
302+
* the "reuse bitmaps" phase may have stored something here, but
303+
* our new algorithm doesn't use it. Drop it.
304+
*/
305+
if (stored->bitmap)
306+
ewah_free(stored->bitmap);
307+
308+
stored->bitmap = bitmap_to_ewah(ent->bitmap);
309+
310+
hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
311+
if (hash_ret == 0)
312+
die("Duplicate entry when writing index: %s",
313+
oid_to_hex(&commit->object.oid));
314+
kh_value(writer.bitmaps, hash_pos) = stored;
315+
}
316+
317+
void bitmap_writer_build(struct packing_data *to_pack)
318+
{
319+
struct bitmap_builder bb;
320+
size_t i;
321+
int nr_stored = 0; /* for progress */
322+
323+
writer.bitmaps = kh_init_oid_map();
324+
writer.to_pack = to_pack;
325+
326+
if (writer.show_progress)
327+
writer.progress = start_progress("Building bitmaps", writer.selected_nr);
328+
trace2_region_enter("pack-bitmap-write", "building_bitmaps_total",
329+
the_repository);
330+
331+
bitmap_builder_init(&bb, &writer);
332+
for (i = bb.commits_nr; i > 0; i--) {
333+
struct commit *commit = bb.commits[i-1];
334+
struct bb_commit *ent = bb_data_at(&bb.data, commit);
335+
struct commit *child;
336+
337+
fill_bitmap_commit(ent, commit);
338+
339+
if (ent->selected) {
340+
store_selected(ent, commit);
341+
nr_stored++;
342+
display_progress(writer.progress, nr_stored);
343+
}
344+
345+
while ((child = pop_commit(&ent->children))) {
346+
struct bb_commit *child_ent =
347+
bb_data_at(&bb.data, child);
348+
349+
if (child_ent->bitmap)
350+
bitmap_or(child_ent->bitmap, ent->bitmap);
351+
else
352+
child_ent->bitmap = bitmap_dup(ent->bitmap);
353+
}
354+
bitmap_free(ent->bitmap);
355+
ent->bitmap = NULL;
321356
}
357+
bitmap_builder_clear(&bb);
358+
359+
trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
360+
the_repository);
322361

323-
bitmap_free(base);
324362
stop_progress(&writer.progress);
325363

326364
compute_xor_offsets();

0 commit comments

Comments
 (0)