Skip to content

Commit 339bce2

Browse files
ttaylorrgitster
authored andcommitted
builtin/pack-objects.c: add '--stdin-packs' option
In an upcoming commit, 'git repack' will want to create a pack comprised of all of the objects in some packs (the included packs) excluding any objects in some other packs (the excluded packs). This caller could iterate those packs themselves and feed the objects it finds to 'git pack-objects' directly over stdin, but this approach has a few downsides: - It requires every caller that wants to drive 'git pack-objects' in this way to implement pack iteration themselves. This forces the caller to think about details like what order objects are fed to pack-objects, which callers would likely rather not do. - If the set of objects in included packs is large, it requires sending a lot of data over a pipe, which is inefficient. - The caller is forced to keep track of the excluded objects, too, and make sure that it doesn't send any objects that appear in both included and excluded packs. But the biggest downside is the lack of a reachability traversal. Because the caller passes in a list of objects directly, those objects don't get a namehash assigned to them, which can have a negative impact on the delta selection process, causing 'git pack-objects' to fail to find good deltas even when they exist. The caller could formulate a reachability traversal themselves, but the only way to drive 'git pack-objects' in this way is to do a full traversal, and then remove objects in the excluded packs after the traversal is complete. This can be detrimental to callers who care about performance, especially in repositories with many objects. Introduce 'git pack-objects --stdin-packs' which remedies these four concerns. 'git pack-objects --stdin-packs' expects a list of pack names on stdin, where 'pack-xyz.pack' denotes that pack as included, and '^pack-xyz.pack' denotes it as excluded. The resulting pack includes all objects that are present in at least one included pack, and aren't present in any excluded pack. To address the delta selection problem, 'git pack-objects --stdin-packs' works as follows. First, it assembles a list of objects that it is going to pack, as above. Then, a reachability traversal is started, whose tips are any commits mentioned in included packs. Upon visiting an object, we find its corresponding object_entry in the to_pack list, and set its namehash parameter appropriately. To avoid the traversal visiting more objects than it needs to, the traversal is halted upon encountering an object which can be found in an excluded pack (by marking the excluded packs as kept in-core, and passing --no-kept-objects=in-core to the revision machinery). This can cause the traversal to halt early, for example if an object in an included pack is an ancestor of ones in excluded packs. But stopping early is OK, since filling in the namehash fields of objects in the to_pack list is only additive (i.e., having it helps the delta selection process, but leaving it blank doesn't impact the correctness of the resulting pack). Even still, it is unlikely that this hurts us much in practice, since the 'git repack --geometric' caller (which is introduced in a later commit) marks small packs as included, and large ones as excluded. During ordinary use, the small packs usually represent pushes after a large repack, and so are unlikely to be ancestors of objects that already exist in the repository. (I found it convenient while developing this patch to have 'git pack-objects' report the number of objects which were visited and got their namehash fields filled in during traversal. This is also included in the below patch via trace2 data lines). Suggested-by: Jeff King <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Reviewed-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent c9fff00 commit 339bce2

File tree

3 files changed

+307
-2
lines changed

3 files changed

+307
-2
lines changed

Documentation/git-pack-objects.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,16 @@ base-name::
8585
reference was included in the resulting packfile. This
8686
can be useful to send new tags to native Git clients.
8787

88+
--stdin-packs::
89+
Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
90+
from the standard input, instead of object names or revision
91+
arguments. The resulting pack contains all objects listed in the
92+
included packs (those not beginning with `^`), excluding any
93+
objects listed in the excluded packs (beginning with `^`).
94+
+
95+
Incompatible with `--revs`, or options that imply `--revs` (such as
96+
`--all`), with the exception of `--unpacked`, which is compatible.
97+
8898
--window=<n>::
8999
--depth=<n>::
90100
These two options affect how the objects contained in

builtin/pack-objects.c

Lines changed: 200 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2986,6 +2986,190 @@ static int git_pack_config(const char *k, const char *v, void *cb)
29862986
return git_default_config(k, v, cb);
29872987
}
29882988

2989+
/* Counters for trace2 output when in --stdin-packs mode. */
2990+
static int stdin_packs_found_nr;
2991+
static int stdin_packs_hints_nr;
2992+
2993+
static int add_object_entry_from_pack(const struct object_id *oid,
2994+
struct packed_git *p,
2995+
uint32_t pos,
2996+
void *_data)
2997+
{
2998+
struct rev_info *revs = _data;
2999+
struct object_info oi = OBJECT_INFO_INIT;
3000+
off_t ofs;
3001+
enum object_type type;
3002+
3003+
display_progress(progress_state, ++nr_seen);
3004+
3005+
if (have_duplicate_entry(oid, 0))
3006+
return 0;
3007+
3008+
ofs = nth_packed_object_offset(p, pos);
3009+
if (!want_object_in_pack(oid, 0, &p, &ofs))
3010+
return 0;
3011+
3012+
oi.typep = &type;
3013+
if (packed_object_info(the_repository, p, ofs, &oi) < 0)
3014+
die(_("could not get type of object %s in pack %s"),
3015+
oid_to_hex(oid), p->pack_name);
3016+
else if (type == OBJ_COMMIT) {
3017+
/*
3018+
* commits in included packs are used as starting points for the
3019+
* subsequent revision walk
3020+
*/
3021+
add_pending_oid(revs, NULL, oid, 0);
3022+
}
3023+
3024+
stdin_packs_found_nr++;
3025+
3026+
create_object_entry(oid, type, 0, 0, 0, p, ofs);
3027+
3028+
return 0;
3029+
}
3030+
3031+
static void show_commit_pack_hint(struct commit *commit, void *_data)
3032+
{
3033+
/* nothing to do; commits don't have a namehash */
3034+
}
3035+
3036+
static void show_object_pack_hint(struct object *object, const char *name,
3037+
void *_data)
3038+
{
3039+
struct object_entry *oe = packlist_find(&to_pack, &object->oid);
3040+
if (!oe)
3041+
return;
3042+
3043+
/*
3044+
* Our 'to_pack' list was constructed by iterating all objects packed in
3045+
* included packs, and so doesn't have a non-zero hash field that you
3046+
* would typically pick up during a reachability traversal.
3047+
*
3048+
* Make a best-effort attempt to fill in the ->hash and ->no_try_delta
3049+
* here using a now in order to perhaps improve the delta selection
3050+
* process.
3051+
*/
3052+
oe->hash = pack_name_hash(name);
3053+
oe->no_try_delta = name && no_try_delta(name);
3054+
3055+
stdin_packs_hints_nr++;
3056+
}
3057+
3058+
static int pack_mtime_cmp(const void *_a, const void *_b)
3059+
{
3060+
struct packed_git *a = ((const struct string_list_item*)_a)->util;
3061+
struct packed_git *b = ((const struct string_list_item*)_b)->util;
3062+
3063+
/*
3064+
* order packs by descending mtime so that objects are laid out
3065+
* roughly as newest-to-oldest
3066+
*/
3067+
if (a->mtime < b->mtime)
3068+
return 1;
3069+
else if (b->mtime < a->mtime)
3070+
return -1;
3071+
else
3072+
return 0;
3073+
}
3074+
3075+
static void read_packs_list_from_stdin(void)
3076+
{
3077+
struct strbuf buf = STRBUF_INIT;
3078+
struct string_list include_packs = STRING_LIST_INIT_DUP;
3079+
struct string_list exclude_packs = STRING_LIST_INIT_DUP;
3080+
struct string_list_item *item = NULL;
3081+
3082+
struct packed_git *p;
3083+
struct rev_info revs;
3084+
3085+
repo_init_revisions(the_repository, &revs, NULL);
3086+
/*
3087+
* Use a revision walk to fill in the namehash of objects in the include
3088+
* packs. To save time, we'll avoid traversing through objects that are
3089+
* in excluded packs.
3090+
*
3091+
* That may cause us to avoid populating all of the namehash fields of
3092+
* all included objects, but our goal is best-effort, since this is only
3093+
* an optimization during delta selection.
3094+
*/
3095+
revs.no_kept_objects = 1;
3096+
revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
3097+
revs.blob_objects = 1;
3098+
revs.tree_objects = 1;
3099+
revs.tag_objects = 1;
3100+
3101+
while (strbuf_getline(&buf, stdin) != EOF) {
3102+
if (!buf.len)
3103+
continue;
3104+
3105+
if (*buf.buf == '^')
3106+
string_list_append(&exclude_packs, buf.buf + 1);
3107+
else
3108+
string_list_append(&include_packs, buf.buf);
3109+
3110+
strbuf_reset(&buf);
3111+
}
3112+
3113+
string_list_sort(&include_packs);
3114+
string_list_sort(&exclude_packs);
3115+
3116+
for (p = get_all_packs(the_repository); p; p = p->next) {
3117+
const char *pack_name = pack_basename(p);
3118+
3119+
item = string_list_lookup(&include_packs, pack_name);
3120+
if (!item)
3121+
item = string_list_lookup(&exclude_packs, pack_name);
3122+
3123+
if (item)
3124+
item->util = p;
3125+
}
3126+
3127+
/*
3128+
* First handle all of the excluded packs, marking them as kept in-core
3129+
* so that later calls to add_object_entry() discards any objects that
3130+
* are also found in excluded packs.
3131+
*/
3132+
for_each_string_list_item(item, &exclude_packs) {
3133+
struct packed_git *p = item->util;
3134+
if (!p)
3135+
die(_("could not find pack '%s'"), item->string);
3136+
p->pack_keep_in_core = 1;
3137+
}
3138+
3139+
/*
3140+
* Order packs by ascending mtime; use QSORT directly to access the
3141+
* string_list_item's ->util pointer, which string_list_sort() does not
3142+
* provide.
3143+
*/
3144+
QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
3145+
3146+
for_each_string_list_item(item, &include_packs) {
3147+
struct packed_git *p = item->util;
3148+
if (!p)
3149+
die(_("could not find pack '%s'"), item->string);
3150+
for_each_object_in_pack(p,
3151+
add_object_entry_from_pack,
3152+
&revs,
3153+
FOR_EACH_OBJECT_PACK_ORDER);
3154+
}
3155+
3156+
if (prepare_revision_walk(&revs))
3157+
die(_("revision walk setup failed"));
3158+
traverse_commit_list(&revs,
3159+
show_commit_pack_hint,
3160+
show_object_pack_hint,
3161+
NULL);
3162+
3163+
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
3164+
stdin_packs_found_nr);
3165+
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
3166+
stdin_packs_hints_nr);
3167+
3168+
strbuf_release(&buf);
3169+
string_list_clear(&include_packs, 0);
3170+
string_list_clear(&exclude_packs, 0);
3171+
}
3172+
29893173
static void read_object_list_from_stdin(void)
29903174
{
29913175
char line[GIT_MAX_HEXSZ + 1 + PATH_MAX + 2];
@@ -3489,6 +3673,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
34893673
struct strvec rp = STRVEC_INIT;
34903674
int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
34913675
int rev_list_index = 0;
3676+
int stdin_packs = 0;
34923677
struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
34933678
struct option pack_objects_options[] = {
34943679
OPT_SET_INT('q', "quiet", &progress,
@@ -3539,6 +3724,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
35393724
OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
35403725
N_("include objects referred to by the index"),
35413726
1, PARSE_OPT_NONEG),
3727+
OPT_BOOL(0, "stdin-packs", &stdin_packs,
3728+
N_("read packs from stdin")),
35423729
OPT_BOOL(0, "stdout", &pack_to_stdout,
35433730
N_("output pack to stdout")),
35443731
OPT_BOOL(0, "include-tag", &include_tag,
@@ -3645,7 +3832,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
36453832
use_internal_rev_list = 1;
36463833
strvec_push(&rp, "--indexed-objects");
36473834
}
3648-
if (rev_list_unpacked) {
3835+
if (rev_list_unpacked && !stdin_packs) {
36493836
use_internal_rev_list = 1;
36503837
strvec_push(&rp, "--unpacked");
36513838
}
@@ -3690,8 +3877,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
36903877
if (filter_options.choice) {
36913878
if (!pack_to_stdout)
36923879
die(_("cannot use --filter without --stdout"));
3880+
if (stdin_packs)
3881+
die(_("cannot use --filter with --stdin-packs"));
36933882
}
36943883

3884+
if (stdin_packs && use_internal_rev_list)
3885+
die(_("cannot use internal rev list with --stdin-packs"));
3886+
36953887
/*
36963888
* "soft" reasons not to use bitmaps - for on-disk repack by default we want
36973889
*
@@ -3750,7 +3942,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
37503942

37513943
if (progress)
37523944
progress_state = start_progress(_("Enumerating objects"), 0);
3753-
if (!use_internal_rev_list)
3945+
if (stdin_packs) {
3946+
/* avoids adding objects in excluded packs */
3947+
ignore_packed_keep_in_core = 1;
3948+
read_packs_list_from_stdin();
3949+
if (rev_list_unpacked)
3950+
add_unreachable_loose_objects();
3951+
} else if (!use_internal_rev_list)
37543952
read_object_list_from_stdin();
37553953
else {
37563954
get_object_list(rp.nr, rp.v);

t/t5300-pack-object.sh

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -532,4 +532,101 @@ test_expect_success 'prefetch objects' '
532532
test_line_count = 1 donelines
533533
'
534534

535+
test_expect_success 'setup for --stdin-packs tests' '
536+
git init stdin-packs &&
537+
(
538+
cd stdin-packs &&
539+
540+
test_commit A &&
541+
test_commit B &&
542+
test_commit C &&
543+
544+
for id in A B C
545+
do
546+
git pack-objects .git/objects/pack/pack-$id \
547+
--incremental --revs <<-EOF
548+
refs/tags/$id
549+
EOF
550+
done &&
551+
552+
ls -la .git/objects/pack
553+
)
554+
'
555+
556+
test_expect_success '--stdin-packs with excluded packs' '
557+
(
558+
cd stdin-packs &&
559+
560+
PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
561+
PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
562+
PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
563+
564+
git pack-objects test --stdin-packs <<-EOF &&
565+
$PACK_A
566+
^$PACK_B
567+
$PACK_C
568+
EOF
569+
570+
(
571+
git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
572+
git show-index <$(ls .git/objects/pack/pack-C-*.idx)
573+
) >expect.raw &&
574+
git show-index <$(ls test-*.idx) >actual.raw &&
575+
576+
cut -d" " -f2 <expect.raw | sort >expect &&
577+
cut -d" " -f2 <actual.raw | sort >actual &&
578+
test_cmp expect actual
579+
)
580+
'
581+
582+
test_expect_success '--stdin-packs is incompatible with --filter' '
583+
(
584+
cd stdin-packs &&
585+
test_must_fail git pack-objects --stdin-packs --stdout \
586+
--filter=blob:none </dev/null 2>err &&
587+
test_i18ngrep "cannot use --filter with --stdin-packs" err
588+
)
589+
'
590+
591+
test_expect_success '--stdin-packs is incompatible with --revs' '
592+
(
593+
cd stdin-packs &&
594+
test_must_fail git pack-objects --stdin-packs --revs out \
595+
</dev/null 2>err &&
596+
test_i18ngrep "cannot use internal rev list with --stdin-packs" err
597+
)
598+
'
599+
600+
test_expect_success '--stdin-packs with loose objects' '
601+
(
602+
cd stdin-packs &&
603+
604+
PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
605+
PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
606+
PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
607+
608+
test_commit D && # loose
609+
610+
git pack-objects test2 --stdin-packs --unpacked <<-EOF &&
611+
$PACK_A
612+
^$PACK_B
613+
$PACK_C
614+
EOF
615+
616+
(
617+
git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
618+
git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
619+
git rev-list --objects --no-object-names \
620+
refs/tags/C..refs/tags/D
621+
622+
) >expect.raw &&
623+
ls -la . &&
624+
git show-index <$(ls test2-*.idx) >actual.raw &&
625+
626+
cut -d" " -f2 <expect.raw | sort >expect &&
627+
cut -d" " -f2 <actual.raw | sort >actual &&
628+
test_cmp expect actual
629+
)
630+
'
631+
535632
test_done

0 commit comments

Comments
 (0)