Skip to content

Commit b7d6f23

Browse files
ttaylorrgitster
authored andcommitted
midx-write.c: use --stdin-packs when repacking
When constructing a new pack `git multi-pack-index repack` provides a list of objects which is the union of objects in all MIDX'd packs which were "included" in the repack. Though correct, this typically yields a poorly structured pack, since providing the objects list over stdin does not give pack-objects a chance to discover the namehash values for each object, leading to sub-optimal delta selection. We can use `--stdin-packs` instead, which has a couple of benefits: - it does a supplemental walk over objects in the supplied list of packs to discover their namehash, leading to higher-quality delta selection - it requires us to list far less data over stdin; instead of listing each object in the resulting pack, we need only list the constituent packs from which those objects were selected in the MIDX Of course, this comes at a slight cost: though we save time on listing packs versus objects over stdin[^1] (around ~650 milliseconds), we add a non-trivial amount of time walking over the given objects in order to find better deltas. In general, this is likely to more closely match the user's expectations (i.e. that packs generated via `git multi-pack-index repack` are written with high-quality deltas). But if not, we can always introduce a new option in pack-objects to disable the supplemental object walk, which would yield a pure CPU-time savings, at the cost of the on-disk size of the resulting pack. [^1]: In a patched version of Git that doesn't perform the supplemental object walk in `pack-objects --stdin-packs`, we save around ~650ms (from 5.968 to 5.325 seconds) when running `git multi-pack-index repack --batch-size=0` on git.git with all objects packed, and all packs in a MIDX. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 440e470 commit b7d6f23

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

midx-write.c

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1474,7 +1474,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
14741474
repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
14751475
repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
14761476

1477-
strvec_push(&cmd.args, "pack-objects");
1477+
strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
1478+
NULL);
14781479

14791480
strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
14801481

@@ -1498,16 +1499,15 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
14981499
}
14991500

15001501
cmd_in = xfdopen(cmd.in, "w");
1501-
1502-
for (i = 0; i < m->num_objects; i++) {
1503-
struct object_id oid;
1504-
uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
1505-
1506-
if (!include_pack[pack_int_id])
1502+
for (i = 0; i < m->num_packs; i++) {
1503+
struct packed_git *p = m->packs[i];
1504+
if (!p)
15071505
continue;
15081506

1509-
nth_midxed_object_oid(&oid, m, i);
1510-
fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
1507+
if (include_pack[i])
1508+
fprintf(cmd_in, "%s\n", pack_basename(p));
1509+
else
1510+
fprintf(cmd_in, "^%s\n", pack_basename(p));
15111511
}
15121512
fclose(cmd_in);
15131513

0 commit comments

Comments
 (0)