Skip to content

Commit 91badeb

Browse files
ttaylorrgitster
authored andcommitted
builtin/repack.c: implement --expire-to for storing pruned objects
When pruning objects with `--cruft`, `git repack` offers some flexibility when selecting the set of which objects are pruned via the `--cruft-expiration` option. This is useful for expiring objects which are older than the grace period, making races where to-be-pruned objects become reachable and then ancestors of freshly pushed objects, leaving the repository in a corrupt state after pruning substantially less likely [1]. But in practice, such races are impossible to avoid entirely, no matter how long the grace period is. To prevent this race, it is often advisable to temporarily put a repository into a read-only state. But in practice, this is not always practical, and so some middle ground would be nice. This patch introduces a new option, `--expire-to`, which teaches `git repack` to write an additional cruft pack containing just the objects which were pruned from the repository. The caller can specify a directory outside of the current repository as the destination for this second cruft pack. This makes it possible to prune objects from a repository, while still holding onto a supplemental copy of them outside of the original repository. Having this copy on-disk makes it substantially easier to recover objects when the aforementioned race is encountered. `--expire-to` is implemented in a somewhat convoluted manner, which is to take advantage of the fact that the first time `write_cruft_pack()` is called, it adds the name of the cruft pack to the `names` string list. That means the second time we call `write_cruft_pack()`, objects in the previously-written cruft pack will be excluded. As long as the caller ensures that no objects are expired during the second pass, this is sufficient to generate a cruft pack containing all objects which don't appear in any of the new packs written by `git repack`, including the cruft pack. In other words, all of the objects which are about to be pruned from the repository. It is important to note that the destination in `--expire-to` does not necessarily need to be a Git repository (though it can be) Notably, the expired packs do not contain all ancestors of expired objects. So if the source repository contains something like: <unreachable> / C1 --- C2 \ refs/heads/master where C2 is unreachable, but has a parent (C1) which is reachable, and C2 would be pruned, then the expiry pack will contain only C2, not C1. [1]: https://lore.kernel.org/git/[email protected]/ Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent c12cda4 commit 91badeb

File tree

3 files changed

+167
-0
lines changed

3 files changed

+167
-0
lines changed

Documentation/git-repack.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,12 @@ to the new separate pack will be written.
7474
immediately instead of waiting for the next `git gc` invocation.
7575
Only useful with `--cruft -d`.
7676

77+
--expire-to=<dir>::
78+
Write a cruft pack containing pruned objects (if any) to the
79+
directory `<dir>`. This option is useful for keeping a copy of
80+
any pruned objects in a separate directory as a backup. Only
81+
useful with `--cruft -d`.
82+
7783
-l::
7884
Pass the `--local` option to 'git pack-objects'. See
7985
linkgit:git-pack-objects[1].

builtin/repack.c

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -702,6 +702,10 @@ static int write_cruft_pack(const struct pack_objects_args *args,
702702
* By the time it is read here, it contains only the pack(s)
703703
* that were just written, which is exactly the set of packs we
704704
* want to consider kept.
705+
*
706+
* If `--expire-to` is given, the double-use served by `names`
707+
* ensures that the pack written to `--expire-to` excludes any
708+
* objects contained in the cruft pack.
705709
*/
706710
in = xfdopen(cmd.in, "w");
707711
for_each_string_list_item(item, names)
@@ -755,6 +759,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
755759
int geometric_factor = 0;
756760
int write_midx = 0;
757761
const char *cruft_expiration = NULL;
762+
const char *expire_to = NULL;
758763

759764
struct option builtin_repack_options[] = {
760765
OPT_BIT('a', NULL, &pack_everything,
@@ -804,6 +809,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
804809
N_("find a geometric progression with factor <N>")),
805810
OPT_BOOL('m', "write-midx", &write_midx,
806811
N_("write a multi-pack index of the resulting packs")),
812+
OPT_STRING(0, "expire-to", &expire_to, N_("dir"),
813+
N_("pack prefix to store a pack containing pruned objects")),
807814
OPT_END()
808815
};
809816

@@ -1000,6 +1007,39 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
10001007
&existing_kept_packs);
10011008
if (ret)
10021009
return ret;
1010+
1011+
if (delete_redundant && expire_to) {
1012+
/*
1013+
* If `--expire-to` is given with `-d`, it's possible
1014+
* that we're about to prune some objects. With cruft
1015+
* packs, pruning is implicit: any objects from existing
1016+
* packs that weren't picked up by new packs are removed
1017+
* when their packs are deleted.
1018+
*
1019+
* Generate an additional cruft pack, with one twist:
1020+
* `names` now includes the name of the cruft pack
1021+
* written in the previous step. So the contents of
1022+
* _this_ cruft pack exclude everything contained in the
1023+
* existing cruft pack (that is, all of the unreachable
1024+
* objects which are no older than
1025+
* `--cruft-expiration`).
1026+
*
1027+
* To make this work, cruft_expiration must become NULL
1028+
* so that this cruft pack doesn't actually prune any
1029+
* objects. If it were non-NULL, this call would always
1030+
* generate an empty pack (since every object not in the
1031+
* cruft pack generated above will have an mtime older
1032+
* than the expiration).
1033+
*/
1034+
ret = write_cruft_pack(&cruft_po_args, expire_to,
1035+
pack_prefix,
1036+
NULL,
1037+
&names,
1038+
&existing_nonkept_packs,
1039+
&existing_kept_packs);
1040+
if (ret)
1041+
return ret;
1042+
}
10031043
}
10041044

10051045
string_list_sort(&names);

t/t7700-repack.sh

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -482,4 +482,125 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
482482
test_server_info_missing
483483
'
484484

485+
test_expect_success '--expire-to stores pruned objects (now)' '
486+
git init expire-to-now &&
487+
(
488+
cd expire-to-now &&
489+
490+
git branch -M main &&
491+
492+
test_commit base &&
493+
494+
git checkout -b cruft &&
495+
test_commit --no-tag cruft &&
496+
497+
git rev-list --objects --no-object-names main..cruft >moved.raw &&
498+
sort moved.raw >moved.want &&
499+
500+
git rev-list --all --objects --no-object-names >expect.raw &&
501+
sort expect.raw >expect &&
502+
503+
git checkout main &&
504+
git branch -D cruft &&
505+
git reflog expire --all --expire=all &&
506+
507+
git init --bare expired.git &&
508+
git repack -d \
509+
--cruft --cruft-expiration="now" \
510+
--expire-to="expired.git/objects/pack/pack" &&
511+
512+
expired="$(ls expired.git/objects/pack/pack-*.idx)" &&
513+
test_path_is_file "${expired%.idx}.mtimes" &&
514+
515+
# Since the `--cruft-expiration` is "now", the effective
516+
# behavior is to move _all_ unreachable objects out to
517+
# the location in `--expire-to`.
518+
git show-index <$expired >expired.raw &&
519+
cut -d" " -f2 expired.raw | sort >expired.objects &&
520+
git rev-list --all --objects --no-object-names \
521+
>remaining.objects &&
522+
523+
# ...in other words, the combined contents of this
524+
# repository and expired.git should be the same as the
525+
# set of objects we started with.
526+
cat expired.objects remaining.objects | sort >actual &&
527+
test_cmp expect actual &&
528+
529+
# The "moved" objects (i.e., those in expired.git)
530+
# should be the same as the cruft objects which were
531+
# expired in the previous step.
532+
test_cmp moved.want expired.objects
533+
)
534+
'
535+
536+
test_expect_success '--expire-to stores pruned objects (5.minutes.ago)' '
537+
git init expire-to-5.minutes.ago &&
538+
(
539+
cd expire-to-5.minutes.ago &&
540+
541+
git branch -M main &&
542+
543+
test_commit base &&
544+
545+
# Create two classes of unreachable objects, one which
546+
# is older than 5 minutes (stale), and another which is
547+
# newer (recent).
548+
for kind in stale recent
549+
do
550+
git checkout -b $kind main &&
551+
test_commit --no-tag $kind || return 1
552+
done &&
553+
554+
git rev-list --objects --no-object-names main..stale >in &&
555+
stale="$(git pack-objects $objdir/pack/pack <in)" &&
556+
mtime="$(test-tool chmtime --get =-600 $objdir/pack/pack-$stale.pack)" &&
557+
558+
# expect holds the set of objects we expect to find in
559+
# this repository after repacking
560+
git rev-list --objects --no-object-names recent >expect.raw &&
561+
sort expect.raw >expect &&
562+
563+
# moved.want holds the set of objects we expect to find
564+
# in expired.git
565+
git rev-list --objects --no-object-names main..stale >out &&
566+
sort out >moved.want &&
567+
568+
git checkout main &&
569+
git branch -D stale recent &&
570+
git reflog expire --all --expire=all &&
571+
git prune-packed &&
572+
573+
git init --bare expired.git &&
574+
git repack -d \
575+
--cruft --cruft-expiration=5.minutes.ago \
576+
--expire-to="expired.git/objects/pack/pack" &&
577+
578+
# Some of the remaining objects in this repository are
579+
# unreachable, so use `cat-file --batch-all-objects`
580+
# instead of `rev-list` to get their names
581+
git cat-file --batch-all-objects --batch-check="%(objectname)" \
582+
>remaining.objects &&
583+
sort remaining.objects >actual &&
584+
test_cmp expect actual &&
585+
586+
(
587+
cd expired.git &&
588+
589+
expired="$(ls objects/pack/pack-*.mtimes)" &&
590+
test-tool pack-mtimes $(basename $expired) >out &&
591+
cut -d" " -f1 out | sort >../moved.got &&
592+
593+
# Ensure that there are as many objects with the
594+
# expected mtime as were moved to expired.git.
595+
#
596+
# In other words, ensure that the recorded
597+
# mtimes of any moved objects was written
598+
# correctly.
599+
grep " $mtime$" out >matching &&
600+
test_line_count = $(wc -l <../moved.want) matching
601+
) &&
602+
test_cmp moved.want moved.got
603+
)
604+
'
605+
485606
test_done

0 commit comments

Comments
 (0)