Skip to content

Commit 0855ed9

Browse files
ttaylorrgitster
authored andcommitted
repack: avoid combining cruft packs with --max-cruft-size
In 37dc6d8 (builtin/repack.c: implement support for `--max-cruft-size`, 2023-10-02), we exposed new functionality that allowed repositories to specify the behavior of when we should combine multiple cruft packs together. This feature was designed to ensure that we never repacked cruft packs which were larger than the given threshold in order to provide tighter I/O bounds for repositories that have many unreachable objects. In essence, specifying '--max-cruft-size=N' instructed 'repack' to aggregate cruft packs together (in order of ascending size) until the combine size grows past 'N', and then make a new cruft pack whose contents includes the packs we rolled up. But this isn't quite how it works in practice. Suppose for example that we have two cruft packs which are each 100MiB in size. One might expect specifying "--max-cruft-size=200M" would combine these two packs together, and then avoid repacking them until a pruning GC takes place. In reality, 'repack' would try and aggregate these together, but writing a pack that is strictly smaller than 200 MiB (since pack-objects' "--max-pack-size" provides a strict bound for packs containing more than one object). So instead we'll write out a pack that is, say, 199 MiB in size, and then another 1 MiB pack containing the balance. If we later repack the repository without adding any new unreachable objects, we'll repeat the same exercise again, making the same 199 MiB and 1 MiB packs each time. This happens because of a poor choice to bolt the '--max-cruft-size' functionality onto pack-objects' '--max-pack-size', forcing us to generate packs which are always smaller than the provided threshold and thus subject to repacking. The following commit will introduce a new flag that implements something similar to the behavior above. Let's prepare for that by making repack's '--max-cruft-size' flag behave as an cruft pack-specific override for '--max-pack-size'. Do so by temporarily repurposing the 'collapse_small_cruft_packs()' function to instead generate a cruft pack using the same instructions as if we didn't specify any maximum pack size. The calling code looks something like: if (args->max_pack_size && !cruft_expiration) { collapse_small_cruft_packs(in, args->max_pack_size, existing); } else { for_each_string_list_item(item, &existing->non_kept_packs) fprintf(in, "-%s.pack\n", item->string); for_each_string_list_item(item, &existing->cruft_packs) fprintf(in, "-%s.pack\n", item->string); } This patch makes collapse_small_cruft_packs() behave identically to the 'else' arm of the conditional above. This repurposing of 'collapse_small_cruft_packs()' is intentional, since it will set us up nicely to introduce the new behavior in the following commit. Naturally, there is some test fallout in the test which exercises the old meaning of '--max-cruft-size'. Mark that test as failing for now to be dealt with in the following commit. Likewise, add a new test which explicitly tests the behavior of '--max-cruft-size' to place a hard limit on the size of any generated cruft pack(s). Note that this is a breaking change, as it alters the user-visible behavior of '--max-cruft-size'. But I'm OK changing this behavior in this instance, since the behavior wasn't accurate to begin with. Signed-off-by: Taylor Blau <[email protected]> Acked-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 7fb12bb commit 0855ed9

File tree

4 files changed

+67
-55
lines changed

4 files changed

+67
-55
lines changed

Documentation/git-repack.adoc

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -77,15 +77,9 @@ to the new separate pack will be written.
7777
Only useful with `--cruft -d`.
7878

7979
--max-cruft-size=<n>::
80-
Repack cruft objects into packs as large as `<n>` bytes before
81-
creating new packs. As long as there are enough cruft packs
82-
smaller than `<n>`, repacking will cause a new cruft pack to
83-
be created containing objects from any combined cruft packs,
84-
along with any new unreachable objects. Cruft packs larger than
85-
`<n>` will not be modified. When the new cruft pack is larger
86-
than `<n>` bytes, it will be split into multiple packs, all of
87-
which are guaranteed to be at most `<n>` bytes in size. Only
88-
useful with `--cruft -d`.
80+
Overrides `--max-pack-size` for cruft packs. Inherits the value of
81+
`--max-pack-size` (if any) by default. See the documentation for
82+
`--max-pack-size` for more details.
8983

9084
--expire-to=<dir>::
9185
Write a cruft pack containing pruned objects (if any) to the

builtin/repack.c

Lines changed: 9 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1022,28 +1022,19 @@ static int write_filtered_pack(const struct pack_objects_args *args,
10221022
return finish_pack_objects_cmd(&cmd, names, local);
10231023
}
10241024

1025-
static int existing_cruft_pack_cmp(const void *va, const void *vb)
1026-
{
1027-
struct packed_git *a = *(struct packed_git **)va;
1028-
struct packed_git *b = *(struct packed_git **)vb;
1029-
1030-
if (a->pack_size < b->pack_size)
1031-
return -1;
1032-
if (a->pack_size > b->pack_size)
1033-
return 1;
1034-
return 0;
1035-
}
1036-
1037-
static void collapse_small_cruft_packs(FILE *in, size_t max_size,
1025+
static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
10381026
struct existing_packs *existing)
10391027
{
1040-
struct packed_git **existing_cruft, *p;
1028+
struct packed_git *p;
10411029
struct strbuf buf = STRBUF_INIT;
1042-
size_t total_size = 0;
1043-
size_t existing_cruft_nr = 0;
10441030
size_t i;
10451031

1046-
ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
1032+
/*
1033+
* Squelch a -Wunused-function warning while we rationalize
1034+
* the behavior of --max-cruft-size. This function will become
1035+
* used again in a future commit.
1036+
*/
1037+
(void)retain_cruft_pack;
10471038

10481039
for (p = get_all_packs(the_repository); p; p = p->next) {
10491040
if (!(p->is_cruft && p->pack_local))
@@ -1056,37 +1047,14 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
10561047
if (!string_list_has_string(&existing->cruft_packs, buf.buf))
10571048
continue;
10581049

1059-
if (existing_cruft_nr >= existing->cruft_packs.nr)
1060-
BUG("too many cruft packs (found %"PRIuMAX", but knew "
1061-
"of %"PRIuMAX")",
1062-
(uintmax_t)existing_cruft_nr + 1,
1063-
(uintmax_t)existing->cruft_packs.nr);
1064-
existing_cruft[existing_cruft_nr++] = p;
1065-
}
1066-
1067-
QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
1068-
1069-
for (i = 0; i < existing_cruft_nr; i++) {
1070-
size_t proposed;
1071-
1072-
p = existing_cruft[i];
1073-
proposed = st_add(total_size, p->pack_size);
1074-
1075-
if (proposed <= max_size) {
1076-
total_size = proposed;
1077-
fprintf(in, "-%s\n", pack_basename(p));
1078-
} else {
1079-
retain_cruft_pack(existing, p);
1080-
fprintf(in, "%s\n", pack_basename(p));
1081-
}
1050+
fprintf(in, "-%s.pack\n", buf.buf);
10821051
}
10831052

10841053
for (i = 0; i < existing->non_kept_packs.nr; i++)
10851054
fprintf(in, "-%s.pack\n",
10861055
existing->non_kept_packs.items[i].string);
10871056

10881057
strbuf_release(&buf);
1089-
free(existing_cruft);
10901058
}
10911059

10921060
static int write_cruft_pack(const struct pack_objects_args *args,

t/t5329-pack-objects-cruft.sh

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -695,4 +695,56 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' '
695695
)
696696
'
697697

698+
test_expect_success 'split cruft packs with --max-cruft-size' '
699+
repo=cruft-with--max-cruft-size &&
700+
test_when_finished "rm -fr $repo" &&
701+
702+
git init "$repo" &&
703+
704+
(
705+
cd "$repo" &&
706+
707+
git config core.compression 0 &&
708+
709+
sz=$((1024 * 1024)) && # 1MiB
710+
test-tool genrandom foo $sz >foo &&
711+
test-tool genrandom bar $sz >bar &&
712+
foo="$(git hash-object -w -t blob foo)" &&
713+
bar="$(git hash-object -w -t blob bar)" &&
714+
715+
to=$packdir/pack &&
716+
# Pack together foo and bar into a single 2MiB pack.
717+
pack="$(git pack-objects $to <<-EOF
718+
$foo
719+
$bar
720+
EOF
721+
)" &&
722+
723+
# Then generate a cruft pack containing foo and bar.
724+
#
725+
# Generate the pack with --max-pack-size equal to the
726+
# size of one object, forcing us to write two cruft
727+
# packs.
728+
git pack-objects --cruft --max-pack-size=$sz $to <<-EOF &&
729+
-pack-$pack.pack
730+
EOF
731+
732+
ls $packdir/pack-*.mtimes >crufts &&
733+
test_line_count = 2 crufts &&
734+
735+
for cruft in $(cat crufts)
736+
do
737+
test-tool pack-mtimes "$(basename "$cruft")" || return 1
738+
done >actual.raw &&
739+
740+
cut -d" " -f1 <actual.raw | sort >actual &&
741+
sort >expect <<-EOF &&
742+
$foo
743+
$bar
744+
EOF
745+
746+
test_cmp expect actual
747+
)
748+
'
749+
698750
test_done

t/t7704-repack-cruft.sh

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
194194
)
195195
'
196196

197-
test_expect_success '--max-cruft-size combines smaller packs first' '
197+
test_expect_failure '--max-cruft-size combines smaller packs first' '
198198
git init max-cruft-size-consume-small &&
199199
(
200200
cd max-cruft-size-consume-small &&
@@ -354,13 +354,11 @@ test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
354354
done >actual.raw &&
355355
sort actual.raw >actual &&
356356
357-
# Among the set of all cruft packs, we should see both
358-
# mtimes for object $foo and $bar, as well as the
357+
# Among the set of all cruft packs, we should see the
358+
# new mtimes for object $foo and $bar, as well as the
359359
# single new copy of $baz.
360360
sort >expect <<-EOF &&
361-
$foo $(cat foo.old)
362361
$foo $(cat foo.new)
363-
$bar $(cat bar.old)
364362
$bar $(cat bar.new)
365363
$baz $(cat baz.old)
366364
$quux $(cat quux.new)

0 commit comments

Comments
 (0)