Skip to content

Commit 77023ea

Browse files
peffgitster
authored andcommitted
t/perf: add tests for many-pack scenarios
Git's pack storage does efficient (log n) lookups in a single packfile's index, but if we have multiple packfiles, we have to linearly search each for a given object. This patch introduces some timing tests for cases where we have a large number of packs, so that we can measure any improvements we make in the following patches. The main thing we want to time is object lookup. To do this, we measure "git rev-list --objects --all", which does a fairly large number of object lookups (essentially one per object in the repository). However, we also measure the time to do a full repack, which is interesting for two reasons. One is that in addition to the usual pack lookup, it has its own linear iteration over the list of packs. And two is that because it it is the tool one uses to go from an inefficient many-pack situation back to a single pack, we care about its performance not only at marginal numbers of packs, but at the extreme cases (e.g., if you somehow end up with 5,000 packs, it is the only way to get back to 1 pack, so we need to make sure it performs well). We measure the performance of each command in three scenarios: 1 pack, 50 packs, and 1,000 packs. The 1-pack case is a baseline; any optimizations we do to handle multiple packs cannot possibly perform better than this. The 50-pack case is as far as Git should generally allow your repository to go, if you have auto-gc enabled with the default settings. So this represents the maximum performance improvement we would expect under normal circumstances. The 1,000-pack case is hopefully rare, though I have seen it in the wild where automatic maintenance was broken for some time (and the repository continued to receive pushes). This represents cases where we care less about general performance, but want to make sure that a full repack command does not take excessively long. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 08bb350 commit 77023ea

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

t/perf/p5303-many-packs.sh

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#!/bin/sh
2+
3+
test_description='performance with large numbers of packs'
4+
. ./perf-lib.sh
5+
6+
test_perf_large_repo
7+
8+
# A real many-pack situation would probably come from having a lot of pushes
9+
# over time. We don't know how big each push would be, but we can fake it by
10+
# just walking the first-parent chain and having every 5 commits be their own
11+
# "push". This isn't _entirely_ accurate, as real pushes would have some
12+
# duplicate objects due to thin-pack fixing, but it's a reasonable
13+
# approximation.
14+
#
15+
# And then all of the rest of the objects can go in a single packfile that
16+
# represents the state before any of those pushes (actually, we'll generate
17+
# that first because in such a setup it would be the oldest pack, and we sort
18+
# the packs by reverse mtime inside git).
19+
repack_into_n () {
20+
rm -rf staging &&
21+
mkdir staging &&
22+
23+
git rev-list --first-parent HEAD |
24+
sed -n '1~5p' |
25+
head -n "$1" |
26+
perl -e 'print reverse <>' \
27+
>pushes
28+
29+
# create base packfile
30+
head -n 1 pushes |
31+
git pack-objects --delta-base-offset --revs staging/pack
32+
33+
# and then incrementals between each pair of commits
34+
last= &&
35+
while read rev
36+
do
37+
if test -n "$last"; then
38+
{
39+
echo "$rev" &&
40+
echo "^$last"
41+
} |
42+
git pack-objects --delta-base-offset --revs \
43+
staging/pack || return 1
44+
fi
45+
last=$rev
46+
done <pushes &&
47+
48+
# and install the whole thing
49+
rm -f .git/objects/pack/* &&
50+
mv staging/* .git/objects/pack/
51+
}
52+
53+
# Pretend we just have a single branch and no reflogs, and that everything is
54+
# in objects/pack; that makes our fake pack-building via repack_into_n()
55+
# much simpler.
56+
test_expect_success 'simplify reachability' '
57+
tip=$(git rev-parse --verify HEAD) &&
58+
git for-each-ref --format="option no-deref%0adelete %(refname)" |
59+
git update-ref --stdin &&
60+
rm -rf .git/logs &&
61+
git update-ref refs/heads/master $tip &&
62+
git symbolic-ref HEAD refs/heads/master &&
63+
git repack -ad
64+
'
65+
66+
for nr_packs in 1 50 1000
67+
do
68+
test_expect_success "create $nr_packs-pack scenario" '
69+
repack_into_n $nr_packs
70+
'
71+
72+
test_perf "rev-list ($nr_packs)" '
73+
git rev-list --objects --all >/dev/null
74+
'
75+
76+
# This simulates the interesting part of the repack, which is the
77+
# actual pack generation, without smudging the on-disk setup
78+
# between trials.
79+
test_perf "repack ($nr_packs)" '
80+
git pack-objects --keep-true-parents \
81+
--honor-pack-keep --non-empty --all \
82+
--reflog --indexed-objects --delta-base-offset \
83+
--stdout </dev/null >/dev/null
84+
'
85+
done
86+
87+
test_done

0 commit comments

Comments
 (0)