Skip to content

Commit 28b8a73

Browse files
peffgitster
authored andcommitted
pack-objects: add delta-islands support
Implement support for delta islands in git pack-objects and document how delta islands work in "Documentation/git-pack-objects.txt" and Documentation/config.txt. This allows users to setup delta islands in their config and get the benefit of less disk usage while cloning and fetching is still quite fast and not much more CPU intensive. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Christian Couder <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent f64ba53 commit 28b8a73

File tree

3 files changed

+161
-8
lines changed

3 files changed

+161
-8
lines changed

Documentation/config.txt

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2585,6 +2585,21 @@ Note that changing the compression level will not automatically recompress
25852585
all existing objects. You can force recompression by passing the -F option
25862586
to linkgit:git-repack[1].
25872587

2588+
pack.island::
2589+
An extended regular expression configuring a set of delta
2590+
islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
2591+
for details.
2592+
2593+
pack.islandCore::
2594+
Specify an island name which gets to have its objects be
2595+
packed first. This creates a kind of pseudo-pack at the front
2596+
of one pack, so that the objects from the specified island are
2597+
hopefully faster to copy into any pack that should be served
2598+
to a user requesting these objects. In practice this means
2599+
that the island specified should likely correspond to what is
2600+
the most commonly cloned in the repo. See also "DELTA ISLANDS"
2601+
in linkgit:git-pack-objects[1].
2602+
25882603
pack.deltaCacheSize::
25892604
The maximum memory in bytes used for caching deltas in
25902605
linkgit:git-pack-objects[1] before writing them out to a pack.

Documentation/git-pack-objects.txt

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,103 @@ Unexpected missing object will raise an error.
289289
--unpack-unreachable::
290290
Keep unreachable objects in loose form. This implies `--revs`.
291291

292+
--delta-islands::
293+
Restrict delta matches based on "islands". See DELTA ISLANDS
294+
below.
295+
296+
297+
DELTA ISLANDS
298+
-------------
299+
300+
When possible, `pack-objects` tries to reuse existing on-disk deltas to
301+
avoid having to search for new ones on the fly. This is an important
302+
optimization for serving fetches, because it means the server can avoid
303+
inflating most objects at all and just send the bytes directly from
304+
disk. This optimization can't work when an object is stored as a delta
305+
against a base which the receiver does not have (and which we are not
306+
already sending). In that case the server "breaks" the delta and has to
307+
find a new one, which has a high CPU cost. Therefore it's important for
308+
performance that the set of objects in on-disk delta relationships match
309+
what a client would fetch.
310+
311+
In a normal repository, this tends to work automatically. The objects
312+
are mostly reachable from the branches and tags, and that's what clients
313+
fetch. Any deltas we find on the server are likely to be between objects
314+
the client has or will have.
315+
316+
But in some repository setups, you may have several related but separate
317+
groups of ref tips, with clients tending to fetch those groups
318+
independently. For example, imagine that you are hosting several "forks"
319+
of a repository in a single shared object store, and letting clients
320+
view them as separate repositories through `GIT_NAMESPACE` or separate
321+
repos using the alternates mechanism. A naive repack may find that the
322+
optimal delta for an object is against a base that is only found in
323+
another fork. But when a client fetches, they will not have the base
324+
object, and we'll have to find a new delta on the fly.
325+
326+
A similar situation may exist if you have many refs outside of
327+
`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
328+
`refs/pull` or `refs/changes` used by some hosting providers). By
329+
default, clients fetch only heads and tags, and deltas against objects
330+
found only in those other groups cannot be sent as-is.
331+
332+
Delta islands solve this problem by allowing you to group your refs into
333+
distinct "islands". Pack-objects computes which objects are reachable
334+
from which islands, and refuses to make a delta from an object `A`
335+
against a base which is not present in all of `A`'s islands. This
336+
results in slightly larger packs (because we miss some delta
337+
opportunities), but guarantees that a fetch of one island will not have
338+
to recompute deltas on the fly due to crossing island boundaries.
339+
340+
When repacking with delta islands the delta window tends to get
341+
clogged with candidates that are forbidden by the config. Repacking
342+
with a big --window helps (and doesn't take as long as it otherwise
343+
might because we can reject some object pairs based on islands before
344+
doing any computation on the content).
345+
346+
Islands are configured via the `pack.island` option, which can be
347+
specified multiple times. Each value is a left-anchored regular
348+
expressions matching refnames. For example:
349+
350+
-------------------------------------------
351+
[pack]
352+
island = refs/heads/
353+
island = refs/tags/
354+
-------------------------------------------
355+
356+
puts heads and tags into an island (whose name is the empty string; see
357+
below for more on naming). Any refs which do not match those regular
358+
expressions (e.g., `refs/pull/123`) is not in any island. Any object
359+
which is reachable only from `refs/pull/` (but not heads or tags) is
360+
therefore not a candidate to be used as a base for `refs/heads/`.
361+
362+
Refs are grouped into islands based on their "names", and two regexes
363+
that produce the same name are considered to be in the same
364+
island. The names are computed from the regexes by concatenating any
365+
capture groups from the regex, with a '-' dash in between. (And if
366+
there are no capture groups, then the name is the empty string, as in
367+
the above example.) This allows you to create arbitrary numbers of
368+
islands. Only up to 14 such capture groups are supported though.
369+
370+
For example, imagine you store the refs for each fork in
371+
`refs/virtual/ID`, where `ID` is a numeric identifier. You might then
372+
configure:
373+
374+
-------------------------------------------
375+
[pack]
376+
island = refs/virtual/([0-9]+)/heads/
377+
island = refs/virtual/([0-9]+)/tags/
378+
island = refs/virtual/([0-9]+)/(pull)/
379+
-------------------------------------------
380+
381+
That puts the heads and tags for each fork in their own island (named
382+
"1234" or similar), and the pull refs for each go into their own
383+
"1234-pull".
384+
385+
Note that we pick a single island for each regex to go into, using "last
386+
one wins" ordering (which allows repo-specific config to take precedence
387+
over user-wide config, and so forth).
388+
292389
SEE ALSO
293390
--------
294391
linkgit:git-rev-list[1]

builtin/pack-objects.c

Lines changed: 49 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#include "streaming.h"
2525
#include "thread-utils.h"
2626
#include "pack-bitmap.h"
27+
#include "delta-islands.h"
2728
#include "reachable.h"
2829
#include "sha1-array.h"
2930
#include "argv-array.h"
@@ -59,6 +60,7 @@ static struct packing_data to_pack;
5960

6061
static struct pack_idx_entry **written_list;
6162
static uint32_t nr_result, nr_written, nr_seen;
63+
static uint32_t write_layer;
6264

6365
static int non_empty;
6466
static int reuse_delta = 1, reuse_object = 1;
@@ -93,6 +95,8 @@ static uint16_t write_bitmap_options;
9395

9496
static int exclude_promisor_objects;
9597

98+
static int use_delta_islands;
99+
96100
static unsigned long delta_cache_size = 0;
97101
static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
98102
static unsigned long cache_max_small_delta_size = 1000;
@@ -607,7 +611,7 @@ static inline void add_to_write_order(struct object_entry **wo,
607611
unsigned int *endp,
608612
struct object_entry *e)
609613
{
610-
if (e->filled)
614+
if (e->filled || e->layer != write_layer)
611615
return;
612616
wo[(*endp)++] = e;
613617
e->filled = 1;
@@ -710,13 +714,14 @@ static void compute_layer_order(struct object_entry **wo, unsigned int *wo_end)
710714
* Finally all the rest in really tight order
711715
*/
712716
for (i = last_untagged; i < to_pack.nr_objects; i++) {
713-
if (!objects[i].filled)
717+
if (!objects[i].filled && objects[i].layer == write_layer)
714718
add_family_to_write_order(wo, wo_end, &objects[i]);
715719
}
716720
}
717721

718722
static struct object_entry **compute_write_order(void)
719723
{
724+
uint32_t max_layers = 1;
720725
unsigned int i, wo_end;
721726

722727
struct object_entry **wo;
@@ -748,14 +753,14 @@ static struct object_entry **compute_write_order(void)
748753
*/
749754
for_each_tag_ref(mark_tagged, NULL);
750755

751-
/*
752-
* Give the objects in the original recency order until
753-
* we see a tagged tip.
754-
*/
756+
if (use_delta_islands)
757+
max_layers = compute_pack_layers(&to_pack);
758+
755759
ALLOC_ARRAY(wo, to_pack.nr_objects);
756760
wo_end = 0;
757761

758-
compute_layer_order(wo, &wo_end);
762+
for (; write_layer < max_layers; ++write_layer)
763+
compute_layer_order(wo, &wo_end);
759764

760765
if (wo_end != to_pack.nr_objects)
761766
die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
@@ -1514,7 +1519,8 @@ static void check_object(struct object_entry *entry)
15141519
break;
15151520
}
15161521

1517-
if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
1522+
if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL)) &&
1523+
in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
15181524
/*
15191525
* If base_ref was set above that means we wish to
15201526
* reuse delta data, and we even found that base
@@ -1830,6 +1836,11 @@ static int type_size_sort(const void *_a, const void *_b)
18301836
return -1;
18311837
if (a->preferred_base < b->preferred_base)
18321838
return 1;
1839+
if (use_delta_islands) {
1840+
int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid);
1841+
if (island_cmp)
1842+
return island_cmp;
1843+
}
18331844
if (a_size > b_size)
18341845
return -1;
18351846
if (a_size < b_size)
@@ -1978,6 +1989,9 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
19781989
if (trg_size < src_size / 32)
19791990
return 0;
19801991

1992+
if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid))
1993+
return 0;
1994+
19811995
/* Load data if not already done */
19821996
if (!trg->data) {
19831997
read_lock();
@@ -2516,6 +2530,9 @@ static void prepare_pack(int window, int depth)
25162530
uint32_t i, nr_deltas;
25172531
unsigned n;
25182532

2533+
if (use_delta_islands)
2534+
resolve_tree_islands(progress, &to_pack);
2535+
25192536
get_object_details();
25202537

25212538
/*
@@ -2679,13 +2696,29 @@ static void show_commit(struct commit *commit, void *data)
26792696

26802697
if (write_bitmap_index)
26812698
index_commit_for_bitmap(commit);
2699+
2700+
if (use_delta_islands)
2701+
propagate_island_marks(commit);
26822702
}
26832703

26842704
static void show_object(struct object *obj, const char *name, void *data)
26852705
{
26862706
add_preferred_base_object(name);
26872707
add_object_entry(&obj->oid, obj->type, name, 0);
26882708
obj->flags |= OBJECT_ADDED;
2709+
2710+
if (use_delta_islands) {
2711+
const char *p;
2712+
unsigned depth = 0;
2713+
struct object_entry *ent;
2714+
2715+
for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
2716+
depth++;
2717+
2718+
ent = packlist_find(&to_pack, obj->oid.hash, NULL);
2719+
if (ent && depth > ent->tree_depth)
2720+
ent->tree_depth = depth;
2721+
}
26892722
}
26902723

26912724
static void show_object__ma_allow_any(struct object *obj, const char *name, void *data)
@@ -3013,6 +3046,9 @@ static void get_object_list(int ac, const char **av)
30133046
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
30143047
return;
30153048

3049+
if (use_delta_islands)
3050+
load_delta_islands();
3051+
30163052
if (prepare_revision_walk(&revs))
30173053
die("revision walk setup failed");
30183054
mark_edges_uninteresting(&revs, show_edge);
@@ -3192,6 +3228,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
31923228
option_parse_missing_action },
31933229
OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
31943230
N_("do not pack objects in promisor packfiles")),
3231+
OPT_BOOL(0, "delta-islands", &use_delta_islands,
3232+
N_("respect islands during delta compression")),
31953233
OPT_END(),
31963234
};
31973235

@@ -3318,6 +3356,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
33183356
if (pack_to_stdout || !rev_list_all)
33193357
write_bitmap_index = 0;
33203358

3359+
if (use_delta_islands)
3360+
argv_array_push(&rp, "--topo-order");
3361+
33213362
if (progress && all_progress_implied)
33223363
progress = 2;
33233364

0 commit comments

Comments
 (0)