Skip to content

Commit 7dbabbb

Browse files
peffgitster
authored andcommitted
pack-objects: enforce --depth limit in reused deltas
Since 898b14c (pack-objects: rework check_delta_limit usage, 2007-04-16), we check the delta depth limit only when figuring out whether we should make a new delta. We don't consider it at all when reusing deltas, which means that packing once with --depth=250, and then again with --depth=50, the second pack may still contain chains larger than 50. This is generally considered a feature, as the results of earlier high-depth repacks are carried forward, used for serving fetches, etc. However, since we started using cross-pack deltas in c9af708 (pack-objects: use mru list when iterating over packs, 2016-08-11), we are no longer bounded by the length of an existing delta chain in a single pack. Here's one particular pathological case: a sequence of N packs, each with 2 objects, the base of which is stored as a delta in a previous pack. If we chain all the deltas together, we have a cycle of length N. We break the cycle, but the tip delta is still at depth N-1. This is less unlikely than it might sound. See the included test for a reconstruction based on real-world actions. I ran into such a case in the wild, where a client was rapidly sending packs, and we had accumulated 10,000 before doing a server-side repack. The pack that "git repack" tried to generate had a very deep chain, which caused pack-objects to run out of stack space in the recursive write_one(). This patch bounds the length of delta chains in the output pack based on --depth, regardless of whether they are caused by cross-pack deltas or existed in the input packs. This fixes the problem, but does have two possible downsides: 1. High-depth aggressive repacks followed by "normal" repacks will throw away the high-depth chains. In the long run this is probably OK; investigation showed that high-depth repacks aren't actually beneficial, and we dropped the aggressive depth default to match the normal case in 07e7dbf (gc: default aggressive depth to 50, 2016-08-11). 2. If you really do want to store high-depth deltas on disk, they may be discarded and new delta computed when serving a fetch, unless you set pack.depth to match your high-depth size. The implementation uses the existing search for delta cycles. That lets us compute the depth of any node based on the depth of its base, because we know the base is DFS_DONE by the time we look at it (modulo any cycles in the graph, but we know there cannot be any because we break them as we see them). There is some subtlety worth mentioning, though. We record the depth of each object as we compute it. It might seem like we could save the per-object storage space by just keeping track of the depth of our traversal (i.e., have break_delta_chains() report how deep it went). But we may visit an object through multiple delta paths, and on subsequent paths we want to know its depth immediately, without having to walk back down to its final base (doing so would make our graph walk quadratic rather than linear). Likewise, one could try to record the depth not from the base, but from our starting point (i.e., start recursion_depth at 0, and pass "recursion_depth + 1" to each invocation of break_delta_chains()). And then when recursion_depth gets too big, we know that we must cut the delta chain. But that technique is wrong if we do not visit the nodes in topological order. In a chain A->B->C, it if we visit "C", then "B", then "A", we will never recurse deeper than 1 link (because we see at each node that we have already visited it). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent ad36dc8 commit 7dbabbb

File tree

3 files changed

+115
-0
lines changed

3 files changed

+115
-0
lines changed

builtin/pack-objects.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1541,6 +1541,8 @@ static int pack_offset_sort(const void *_a, const void *_b)
15411541
* 2. Updating our size/type to the non-delta representation. These were
15421542
* either not recorded initially (size) or overwritten with the delta type
15431543
* (type) when check_object() decided to reuse the delta.
1544+
*
1545+
* 3. Resetting our delta depth, as we are now a base object.
15441546
*/
15451547
static void drop_reused_delta(struct object_entry *entry)
15461548
{
@@ -1554,6 +1556,7 @@ static void drop_reused_delta(struct object_entry *entry)
15541556
p = &(*p)->delta_sibling;
15551557
}
15561558
entry->delta = NULL;
1559+
entry->depth = 0;
15571560

15581561
oi.sizep = &entry->size;
15591562
oi.typep = &entry->type;
@@ -1572,6 +1575,9 @@ static void drop_reused_delta(struct object_entry *entry)
15721575
* Follow the chain of deltas from this entry onward, throwing away any links
15731576
* that cause us to hit a cycle (as determined by the DFS state flags in
15741577
* the entries).
1578+
*
1579+
* We also detect too-long reused chains that would violate our --depth
1580+
* limit.
15751581
*/
15761582
static void break_delta_chains(struct object_entry *entry)
15771583
{
@@ -1589,6 +1595,18 @@ static void break_delta_chains(struct object_entry *entry)
15891595
*/
15901596
entry->dfs_state = DFS_ACTIVE;
15911597
break_delta_chains(entry->delta);
1598+
1599+
/*
1600+
* Once we've recursed, our base (if we still have one) knows
1601+
* its depth, so we can compute ours (and check it against
1602+
* the limit).
1603+
*/
1604+
if (entry->delta) {
1605+
entry->depth = entry->delta->depth + 1;
1606+
if (entry->depth > depth)
1607+
drop_reused_delta(entry);
1608+
}
1609+
15921610
entry->dfs_state = DFS_DONE;
15931611
break;
15941612

pack-objects.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,16 @@ struct object_entry {
3030

3131
/*
3232
* State flags for depth-first search used for analyzing delta cycles.
33+
*
34+
* The depth is measured in delta-links to the base (so if A is a delta
35+
* against B, then A has a depth of 1, and B a depth of 0).
3336
*/
3437
enum {
3538
DFS_NONE = 0,
3639
DFS_ACTIVE,
3740
DFS_DONE
3841
} dfs_state;
42+
int depth;
3943
};
4044

4145
struct packing_data {

t/t5316-pack-delta-depth.sh

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
#!/bin/sh
2+
3+
test_description='pack-objects breaks long cross-pack delta chains'
4+
. ./test-lib.sh
5+
6+
# This mirrors a repeated push setup:
7+
#
8+
# 1. A client repeatedly modifies some files, makes a
9+
# commit, and pushes the result. It does this N times
10+
# before we get around to repacking.
11+
#
12+
# 2. Each push generates a thin pack with the new version of
13+
# various objects. Let's consider some file in the root tree
14+
# which is updated in each commit.
15+
#
16+
# When generating push number X, we feed commit X-1 (and
17+
# thus blob X-1) as a preferred base. The resulting pack has
18+
# blob X as a thin delta against blob X-1.
19+
#
20+
# On the receiving end, "index-pack --fix-thin" will
21+
# complete the pack with a base copy of blob X-1.
22+
#
23+
# 3. In older versions of git, if we used the delta from
24+
# pack X, then we'd always find blob X-1 as a base in the
25+
# same pack (and generate a fresh delta).
26+
#
27+
# But with the pack mru, we jump from delta to delta
28+
# following the traversal order:
29+
#
30+
# a. We grab blob X from pack X as a delta, putting it at
31+
# the tip of our mru list.
32+
#
33+
# b. Eventually we move onto commit X-1. We need other
34+
# objects which are only in pack X-1 (in the test code
35+
# below, it's the containing tree). That puts pack X-1
36+
# at the tip of our mru list.
37+
#
38+
# c. Eventually we look for blob X-1, and we find the
39+
# version in pack X-1 (because it's the mru tip).
40+
#
41+
# Now we have blob X as a delta against X-1, which is a delta
42+
# against X-2, and so forth.
43+
#
44+
# In the real world, these small pushes would get exploded by
45+
# unpack-objects rather than "index-pack --fix-thin", but the
46+
# same principle applies to larger pushes (they only need one
47+
# repeatedly-modified file to generate the delta chain).
48+
49+
test_expect_success 'create series of packs' '
50+
test-genrandom foo 4096 >content &&
51+
prev= &&
52+
for i in $(test_seq 1 10)
53+
do
54+
cat content >file &&
55+
echo $i >>file &&
56+
git add file &&
57+
git commit -m $i &&
58+
cur=$(git rev-parse HEAD^{tree}) &&
59+
{
60+
test -n "$prev" && echo "-$prev"
61+
echo $cur
62+
echo "$(git rev-parse :file) file"
63+
} | git pack-objects --stdout >tmp &&
64+
git index-pack --stdin --fix-thin <tmp || return 1
65+
prev=$cur
66+
done
67+
'
68+
69+
max_chain() {
70+
git index-pack --verify-stat-only "$1" >output &&
71+
perl -lne '
72+
/chain length = (\d+)/ and $len = $1;
73+
END { print $len }
74+
' output
75+
}
76+
77+
# Note that this whole setup is pretty reliant on the current
78+
# packing heuristics. We double-check that our test case
79+
# actually produces a long chain. If it doesn't, it should be
80+
# adjusted (or scrapped if the heuristics have become too unreliable)
81+
test_expect_success 'packing produces a long delta' '
82+
# Use --window=0 to make sure we are seeing reused deltas,
83+
# not computing a new long chain.
84+
pack=$(git pack-objects --all --window=0 </dev/null pack) &&
85+
test 9 = "$(max_chain pack-$pack.pack)"
86+
'
87+
88+
test_expect_success '--depth limits depth' '
89+
pack=$(git pack-objects --all --depth=5 </dev/null pack) &&
90+
test 5 = "$(max_chain pack-$pack.pack)"
91+
'
92+
93+
test_done

0 commit comments

Comments
 (0)