Skip to content

Commit 70664d2

Browse files
derrickstoleegitster
authored andcommitted
pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 4bc0ba0 commit 70664d2

File tree

4 files changed

+168
-9
lines changed

4 files changed

+168
-9
lines changed

Documentation/git-pack-objects.adoc

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ SYNOPSIS
1616
[--cruft] [--cruft-expiration=<time>]
1717
[--stdout [--filter=<filter-spec>] | <base-name>]
1818
[--shallow] [--keep-true-parents] [--[no-]sparse]
19-
[--name-hash-version=<n>] < <object-list>
19+
[--name-hash-version=<n>] [--path-walk] < <object-list>
2020

2121

2222
DESCRIPTION
@@ -375,6 +375,17 @@ many different directories. At the moment, this version is not allowed
375375
when writing reachability bitmap files with `--write-bitmap-index` and it
376376
will be automatically changed to version `1`.
377377
378+
--path-walk::
379+
Perform compression by first organizing objects by path, then a
380+
second pass that compresses across paths as normal. This has the
381+
potential to improve delta compression especially in the presence
382+
of filenames that cause collisions in Git's default name-hash
383+
algorithm.
384+
+
385+
Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
386+
`--use-bitmap-index` option will be ignored in the presence of
387+
`--path-walk.`
388+
378389
379390
DELTA ISLANDS
380391
-------------

Documentation/technical/api-path-walk.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,5 @@ Examples
6969

7070
See example usages in:
7171
`t/helper/test-path-walk.c`,
72+
`builtin/pack-objects.c`,
7273
`builtin/backfill.c`

builtin/pack-objects.c

Lines changed: 140 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@
4141
#include "promisor-remote.h"
4242
#include "pack-mtimes.h"
4343
#include "parse-options.h"
44+
#include "blob.h"
45+
#include "tree.h"
46+
#include "path-walk.h"
4447

4548
/*
4649
* Objects we are going to pack are collected in the `to_pack` structure.
@@ -217,6 +220,7 @@ static int delta_search_threads;
217220
static int pack_to_stdout;
218221
static int sparse;
219222
static int thin;
223+
static int path_walk;
220224
static int num_preferred_base;
221225
static struct progress *progress_state;
222226

@@ -4191,6 +4195,105 @@ static void mark_bitmap_preferred_tips(void)
41914195
}
41924196
}
41934197

4198+
static inline int is_oid_uninteresting(struct repository *repo,
4199+
struct object_id *oid)
4200+
{
4201+
struct object *o = lookup_object(repo, oid);
4202+
return !o || (o->flags & UNINTERESTING);
4203+
}
4204+
4205+
static int add_objects_by_path(const char *path,
4206+
struct oid_array *oids,
4207+
enum object_type type,
4208+
void *data)
4209+
{
4210+
struct object_entry **delta_list = NULL;
4211+
size_t oe_start = to_pack.nr_objects;
4212+
size_t oe_end;
4213+
unsigned int sub_list_nr;
4214+
unsigned int *processed = data;
4215+
4216+
/*
4217+
* First, add all objects to the packing data, including the ones
4218+
* marked UNINTERESTING (translated to 'exclude') as they can be
4219+
* used as delta bases.
4220+
*/
4221+
for (size_t i = 0; i < oids->nr; i++) {
4222+
int exclude;
4223+
struct object_info oi = OBJECT_INFO_INIT;
4224+
struct object_id *oid = &oids->oid[i];
4225+
4226+
/* Skip objects that do not exist locally. */
4227+
if (exclude_promisor_objects &&
4228+
oid_object_info_extended(the_repository, oid, &oi,
4229+
OBJECT_INFO_FOR_PREFETCH) < 0)
4230+
continue;
4231+
4232+
exclude = is_oid_uninteresting(the_repository, oid);
4233+
4234+
if (exclude && !thin)
4235+
continue;
4236+
4237+
add_object_entry(oid, type, path, exclude);
4238+
}
4239+
4240+
oe_end = to_pack.nr_objects;
4241+
4242+
/* We can skip delta calculations if it is a no-op. */
4243+
if (oe_end == oe_start || !window)
4244+
return 0;
4245+
4246+
sub_list_nr = 0;
4247+
if (oe_end > oe_start)
4248+
ALLOC_ARRAY(delta_list, oe_end - oe_start);
4249+
4250+
for (size_t i = 0; i < oe_end - oe_start; i++) {
4251+
struct object_entry *entry = to_pack.objects + oe_start + i;
4252+
4253+
if (!should_attempt_deltas(entry))
4254+
continue;
4255+
4256+
delta_list[sub_list_nr++] = entry;
4257+
}
4258+
4259+
/*
4260+
* Find delta bases among this list of objects that all match the same
4261+
* path. This causes the delta compression to be interleaved in the
4262+
* object walk, which can lead to confusing progress indicators. This is
4263+
* also incompatible with threaded delta calculations. In the future,
4264+
* consider creating a list of regions in the full to_pack.objects array
4265+
* that could be picked up by the threaded delta computation.
4266+
*/
4267+
if (sub_list_nr && window) {
4268+
QSORT(delta_list, sub_list_nr, type_size_sort);
4269+
find_deltas(delta_list, &sub_list_nr, window, depth, processed);
4270+
}
4271+
4272+
free(delta_list);
4273+
return 0;
4274+
}
4275+
4276+
static void get_object_list_path_walk(struct rev_info *revs)
4277+
{
4278+
struct path_walk_info info = PATH_WALK_INFO_INIT;
4279+
unsigned int processed = 0;
4280+
4281+
info.revs = revs;
4282+
info.path_fn = add_objects_by_path;
4283+
info.path_fn_data = &processed;
4284+
4285+
/*
4286+
* Allow the --[no-]sparse option to be interesting here, if only
4287+
* for testing purposes. Paths with no interesting objects will not
4288+
* contribute to the resulting pack, but only create noisy preferred
4289+
* base objects.
4290+
*/
4291+
info.prune_all_uninteresting = sparse;
4292+
4293+
if (walk_objects_by_path(&info))
4294+
die(_("failed to pack objects via path-walk"));
4295+
}
4296+
41944297
static void get_object_list(struct rev_info *revs, int ac, const char **av)
41954298
{
41964299
struct setup_revision_opt s_r_opt = {
@@ -4246,15 +4349,19 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
42464349
if (write_bitmap_index)
42474350
mark_bitmap_preferred_tips();
42484351

4249-
if (prepare_revision_walk(revs))
4250-
die(_("revision walk setup failed"));
4251-
mark_edges_uninteresting(revs, show_edge, sparse);
4252-
42534352
if (!fn_show_object)
42544353
fn_show_object = show_object;
4255-
traverse_commit_list(revs,
4256-
show_commit, fn_show_object,
4257-
NULL);
4354+
4355+
if (path_walk) {
4356+
get_object_list_path_walk(revs);
4357+
} else {
4358+
if (prepare_revision_walk(revs))
4359+
die(_("revision walk setup failed"));
4360+
mark_edges_uninteresting(revs, show_edge, sparse);
4361+
traverse_commit_list(revs,
4362+
show_commit, fn_show_object,
4363+
NULL);
4364+
}
42584365

42594366
if (unpack_unreachable_expiration) {
42604367
revs->ignore_missing_links = 1;
@@ -4464,6 +4571,8 @@ int cmd_pack_objects(int argc,
44644571
N_("use the sparse reachability algorithm")),
44654572
OPT_BOOL(0, "thin", &thin,
44664573
N_("create thin packs")),
4574+
OPT_BOOL(0, "path-walk", &path_walk,
4575+
N_("use the path-walk API to walk objects when possible")),
44674576
OPT_BOOL(0, "shallow", &shallow,
44684577
N_("create packs suitable for shallow fetches")),
44694578
OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep_on_disk,
@@ -4549,7 +4658,30 @@ int cmd_pack_objects(int argc,
45494658
window = 0;
45504659

45514660
strvec_push(&rp, "pack-objects");
4552-
if (thin) {
4661+
4662+
if (path_walk) {
4663+
const char *option = NULL;
4664+
if (filter_options.choice)
4665+
option = "--filter";
4666+
else if (use_delta_islands)
4667+
option = "--delta-islands";
4668+
else if (shallow)
4669+
option = "--shallow";
4670+
4671+
if (option) {
4672+
warning(_("cannot use %s with %s"),
4673+
option, "--path-walk");
4674+
path_walk = 0;
4675+
}
4676+
}
4677+
if (path_walk) {
4678+
strvec_push(&rp, "--boundary");
4679+
/*
4680+
* We must disable the bitmaps because we are removing
4681+
* the --objects / --objects-edge[-aggressive] options.
4682+
*/
4683+
use_bitmap_index = 0;
4684+
} else if (thin) {
45534685
use_internal_rev_list = 1;
45544686
strvec_push(&rp, shallow
45554687
? "--objects-edge-aggressive"

t/t5300-pack-object.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -723,4 +723,19 @@ test_expect_success '--name-hash-version=2 and --write-bitmap-index are incompat
723723
! test_grep "currently, --write-bitmap-index requires --name-hash-version=1" err
724724
'
725725

726+
test_expect_success '--path-walk pack everything' '
727+
git -C server rev-parse HEAD >in &&
728+
git -C server pack-objects --stdout --revs --path-walk <in >out.pack &&
729+
git -C server index-pack --stdin <out.pack
730+
'
731+
732+
test_expect_success '--path-walk thin pack' '
733+
cat >in <<-EOF &&
734+
$(git -C server rev-parse HEAD)
735+
^$(git -C server rev-parse HEAD~2)
736+
EOF
737+
git -C server pack-objects --thin --stdout --revs --path-walk <in >out.pack &&
738+
git -C server index-pack --fix-thin --stdin <out.pack
739+
'
740+
726741
test_done

0 commit comments

Comments
 (0)