Skip to content

Commit cd846ba

Browse files
ttaylorrgitster
authored andcommitted
pack-objects: introduce '--stdin-packs=follow'
When invoked with '--stdin-packs', pack-objects will generate a pack which contains the objects found in the "included" packs, less any objects from "excluded" packs. Packs that exist in the repository but weren't specified as either included or excluded are in practice treated like the latter, at least in the sense that pack-objects won't include objects from those packs. This behavior forces us to include any cruft pack(s) in a repository's multi-pack index for the reasons described in ddee370 (builtin/repack.c: add cruft packs to MIDX during geometric repack, 2022-05-20). The full details are in ddee370, but the gist is if you have a once-unreachable object in a cruft pack which later becomes reachable via one or more commits in a pack generated with '--stdin-packs', you *have* to include that object in the MIDX via the copy in the cruft pack, otherwise we cannot generate reachability bitmaps for any commits which reach that object. Note that the traversal here is best-effort, similar to the existing traversal which provides name-hash hints. This means that the object traversal may hand us back a blob that does not actually exist. We *won't* see missing trees/commits with 'ignore_missing_links' because: - missing commit parents are discarded at the commit traversal stage by revision.c::process_parents() - missing tag objects are discarded by revision.c::handle_commit() - missing tree objects are discarded by the list-objects code in list-objects.c::process_tree() But we have to handle potentially-missing blobs specially by making a separate check to ensure they exist in the repository. Failing to do so would mean that we'd add an object to the packing list which doesn't actually exist, rendering us unable to write out the pack. This prepares us for new repacking behavior which will "resurrect" objects found in cruft or otherwise unspecified packs when generating new packs. In the context of geometric repacking, this may be used to maintain a sequence of geometrically-repacked packs, the union of which is closed under reachability, even in the case described earlier. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 63195f0 commit cd846ba

File tree

3 files changed

+193
-23
lines changed

3 files changed

+193
-23
lines changed

Documentation/git-pack-objects.adoc

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,13 +87,21 @@ base-name::
8787
reference was included in the resulting packfile. This
8888
can be useful to send new tags to native Git clients.
8989

90-
--stdin-packs::
90+
--stdin-packs[=<mode>]::
9191
Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
9292
from the standard input, instead of object names or revision
9393
arguments. The resulting pack contains all objects listed in the
9494
included packs (those not beginning with `^`), excluding any
9595
objects listed in the excluded packs (beginning with `^`).
9696
+
97+
When `mode` is "follow", objects from packs not listed on stdin receive
98+
special treatment. Objects within unlisted packs will be included if
99+
those objects are (1) reachable from the included packs, and (2) not
100+
found in any excluded packs. This mode is useful, for example, to
101+
resurrect once-unreachable objects found in cruft packs to generate
102+
packs which are closed under reachability up to the boundary set by the
103+
excluded packs.
104+
+
97105
Incompatible with `--revs`, or options that imply `--revs` (such as
98106
`--all`), with the exception of `--unpacked`, which is compatible.
99107

builtin/pack-objects.c

Lines changed: 64 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,12 @@ static struct oidmap configured_exclusions;
284284
static struct oidset excluded_by_config;
285285
static int name_hash_version = -1;
286286

287+
enum stdin_packs_mode {
288+
STDIN_PACKS_MODE_NONE,
289+
STDIN_PACKS_MODE_STANDARD,
290+
STDIN_PACKS_MODE_FOLLOW,
291+
};
292+
287293
/**
288294
* Check whether the name_hash_version chosen by user input is appropriate,
289295
* and also validate whether it is compatible with other features.
@@ -3749,31 +3755,47 @@ static int add_object_entry_from_pack(const struct object_id *oid,
37493755
}
37503756

37513757
static void show_object_pack_hint(struct object *object, const char *name,
3752-
void *data UNUSED)
3758+
void *data)
37533759
{
3754-
struct object_entry *oe = packlist_find(&to_pack, &object->oid);
3755-
if (!oe)
3756-
return;
3760+
enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
3761+
if (mode == STDIN_PACKS_MODE_FOLLOW) {
3762+
if (object->type == OBJ_BLOB &&
3763+
!has_object(the_repository, &object->oid, 0))
3764+
return;
3765+
add_object_entry(&object->oid, object->type, name, 0);
3766+
} else {
3767+
struct object_entry *oe = packlist_find(&to_pack, &object->oid);
3768+
if (!oe)
3769+
return;
37573770

3758-
/*
3759-
* Our 'to_pack' list was constructed by iterating all objects packed in
3760-
* included packs, and so doesn't have a non-zero hash field that you
3761-
* would typically pick up during a reachability traversal.
3762-
*
3763-
* Make a best-effort attempt to fill in the ->hash and ->no_try_delta
3764-
* fields here in order to perhaps improve the delta selection
3765-
* process.
3766-
*/
3767-
oe->hash = pack_name_hash_fn(name);
3768-
oe->no_try_delta = name && no_try_delta(name);
3771+
/*
3772+
* Our 'to_pack' list was constructed by iterating all
3773+
* objects packed in included packs, and so doesn't have
3774+
* a non-zero hash field that you would typically pick
3775+
* up during a reachability traversal.
3776+
*
3777+
* Make a best-effort attempt to fill in the ->hash and
3778+
* ->no_try_delta fields here in order to perhaps
3779+
* improve the delta selection process.
3780+
*/
3781+
oe->hash = pack_name_hash_fn(name);
3782+
oe->no_try_delta = name && no_try_delta(name);
37693783

3770-
stdin_packs_hints_nr++;
3784+
stdin_packs_hints_nr++;
3785+
}
37713786
}
37723787

3773-
static void show_commit_pack_hint(struct commit *commit UNUSED,
3774-
void *data UNUSED)
3788+
static void show_commit_pack_hint(struct commit *commit, void *data)
37753789
{
3790+
enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
3791+
3792+
if (mode == STDIN_PACKS_MODE_FOLLOW) {
3793+
show_object_pack_hint((struct object *)commit, "", data);
3794+
return;
3795+
}
3796+
37763797
/* nothing to do; commits don't have a namehash */
3798+
37773799
}
37783800

37793801
static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3881,7 +3903,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
38813903

38823904
static void add_unreachable_loose_objects(struct rev_info *revs);
38833905

3884-
static void read_stdin_packs(int rev_list_unpacked)
3906+
static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
38853907
{
38863908
struct rev_info revs;
38873909

@@ -3913,7 +3935,7 @@ static void read_stdin_packs(int rev_list_unpacked)
39133935
traverse_commit_list(&revs,
39143936
show_commit_pack_hint,
39153937
show_object_pack_hint,
3916-
NULL);
3938+
&mode);
39173939

39183940
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
39193941
stdin_packs_found_nr);
@@ -4795,6 +4817,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
47954817
return is_not_in_promisor_pack_obj((struct object *) commit, data);
47964818
}
47974819

4820+
static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
4821+
int unset)
4822+
{
4823+
enum stdin_packs_mode *mode = opt->value;
4824+
4825+
if (unset)
4826+
*mode = STDIN_PACKS_MODE_NONE;
4827+
else if (!arg || !*arg)
4828+
*mode = STDIN_PACKS_MODE_STANDARD;
4829+
else if (!strcmp(arg, "follow"))
4830+
*mode = STDIN_PACKS_MODE_FOLLOW;
4831+
else
4832+
die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
4833+
4834+
return 0;
4835+
}
4836+
47984837
int cmd_pack_objects(int argc,
47994838
const char **argv,
48004839
const char *prefix,
@@ -4805,7 +4844,7 @@ int cmd_pack_objects(int argc,
48054844
struct strvec rp = STRVEC_INIT;
48064845
int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
48074846
int rev_list_index = 0;
4808-
int stdin_packs = 0;
4847+
enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
48094848
struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
48104849
struct list_objects_filter_options filter_options =
48114850
LIST_OBJECTS_FILTER_INIT;
@@ -4860,6 +4899,9 @@ int cmd_pack_objects(int argc,
48604899
OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
48614900
N_("include objects referred to by the index"),
48624901
1, PARSE_OPT_NONEG),
4902+
OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
4903+
N_("read packs from stdin"),
4904+
PARSE_OPT_OPTARG, parse_stdin_packs_mode),
48634905
OPT_BOOL(0, "stdin-packs", &stdin_packs,
48644906
N_("read packs from stdin")),
48654907
OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -5150,7 +5192,7 @@ int cmd_pack_objects(int argc,
51505192
progress_state = start_progress(the_repository,
51515193
_("Enumerating objects"), 0);
51525194
if (stdin_packs) {
5153-
read_stdin_packs(rev_list_unpacked);
5195+
read_stdin_packs(stdin_packs, rev_list_unpacked);
51545196
} else if (cruft) {
51555197
read_cruft_objects();
51565198
} else if (!use_internal_rev_list) {

t/t5331-pack-objects-stdin.sh

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,4 +236,124 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
236236
test_cmp expected-objects actual-objects
237237
'
238238

239+
objdir=.git/objects
240+
packdir=$objdir/pack
241+
242+
objects_in_packs () {
243+
for p in "$@"
244+
do
245+
git show-index <"$packdir/pack-$p.idx" || return 1
246+
done >objects.raw &&
247+
248+
cut -d' ' -f2 objects.raw | sort &&
249+
rm -f objects.raw
250+
}
251+
252+
test_expect_success '--stdin-packs=follow walks into unknown packs' '
253+
test_when_finished "rm -fr repo" &&
254+
255+
git init repo &&
256+
(
257+
cd repo &&
258+
259+
for c in A B C D
260+
do
261+
test_commit "$c" || return 1
262+
done &&
263+
264+
A="$(echo A | git pack-objects --revs $packdir/pack)" &&
265+
B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
266+
C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
267+
D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
268+
test_commit E &&
269+
270+
git prune-packed &&
271+
272+
cat >in <<-EOF &&
273+
pack-$B.pack
274+
^pack-$C.pack
275+
pack-$D.pack
276+
EOF
277+
278+
# With just --stdin-packs, pack "A" is unknown to us, so
279+
# only objects from packs "B" and "D" are included in
280+
# the output pack.
281+
P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
282+
objects_in_packs $B $D >expect &&
283+
objects_in_packs $P >actual &&
284+
test_cmp expect actual &&
285+
286+
# But with --stdin-packs=follow, objects from both
287+
# included packs reach objects from the unknown pack, so
288+
# objects from pack "A" is included in the output pack
289+
# in addition to the above.
290+
P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
291+
objects_in_packs $A $B $D >expect &&
292+
objects_in_packs $P >actual &&
293+
test_cmp expect actual &&
294+
295+
# And with --unpacked, we will pick up objects from unknown
296+
# packs that are reachable from loose objects. Loose object E
297+
# reaches objects in pack A, but there are three excluded packs
298+
# in between.
299+
#
300+
# The resulting pack should include objects reachable from E
301+
# that are not present in packs B, C, or D, along with those
302+
# present in pack A.
303+
cat >in <<-EOF &&
304+
^pack-$B.pack
305+
^pack-$C.pack
306+
^pack-$D.pack
307+
EOF
308+
309+
P=$(git pack-objects --stdin-packs=follow --unpacked \
310+
$packdir/pack <in) &&
311+
312+
{
313+
objects_in_packs $A &&
314+
git rev-list --objects --no-object-names D..E
315+
}>expect.raw &&
316+
sort expect.raw >expect &&
317+
objects_in_packs $P >actual &&
318+
test_cmp expect actual
319+
)
320+
'
321+
322+
stdin_packs__follow_with_only () {
323+
rm -fr stdin_packs__follow_with_only &&
324+
git init stdin_packs__follow_with_only &&
325+
(
326+
cd stdin_packs__follow_with_only &&
327+
328+
test_commit A &&
329+
test_commit B &&
330+
331+
git rev-parse "$@" >B.objects &&
332+
333+
echo A | git pack-objects --revs $packdir/pack &&
334+
B="$(git pack-objects $packdir/pack <B.objects)" &&
335+
336+
git cat-file --batch-check="%(objectname)" --batch-all-objects >objs &&
337+
for obj in $(cat objs)
338+
do
339+
rm -f $objdir/$(test_oid_to_path $obj) || return 1
340+
done &&
341+
342+
( cd $packdir && ls pack-*.pack ) >in &&
343+
git pack-objects --stdin-packs=follow --stdout >/dev/null <in
344+
)
345+
}
346+
347+
test_expect_success '--stdin-packs=follow tolerates missing blobs' '
348+
stdin_packs__follow_with_only HEAD HEAD^{tree}
349+
'
350+
351+
test_expect_success '--stdin-packs=follow tolerates missing trees' '
352+
stdin_packs__follow_with_only HEAD HEAD:B.t
353+
'
354+
355+
test_expect_success '--stdin-packs=follow tolerates missing commits' '
356+
stdin_packs__follow_with_only HEAD HEAD^{tree}
357+
'
358+
239359
test_done

0 commit comments

Comments
 (0)