Skip to content

Commit 5827a03

Browse files
peffgitster
authored andcommitted
fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the tags advertised by the remote and fetch ones where we don't already have the tag, but we do have the object it peels to. This involves a lot of calls to has_sha1_file(), some of which we can reasonably expect to fail. Since 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30), this may cause many calls to reprepare_packed_git(), which is potentially expensive. This has gone unnoticed for several years because it requires a fairly unique setup to matter: 1. You need to have a lot of packs on the client side to make reprepare_packed_git() expensive (the most expensive part is finding duplicates in an unsorted list, which is currently quadratic). 2. You need a large number of tag refs on the server side that are candidates for auto-following (i.e., that the client doesn't have). Each one triggers a re-read of the pack directory. 3. Under normal circumstances, the client would auto-follow those tags and after one large fetch, (2) would no longer be true. But if those tags point to history which is disconnected from what the client otherwise fetches, then it will never auto-follow, and those candidates will impact it on every fetch. So when all three are true, each fetch pays an extra O(nr_tags * nr_packs^2) cost, mostly in string comparisons on the pack names. This was exacerbated by 47bf4b0 (prepare_packed_git_one: refactor duplicate-pack check, 2014-06-30) which uses a slightly more expensive string check, under the assumption that the duplicate check doesn't happen very often (and it shouldn't; the real problem here is how often we are calling reprepare_packed_git()). This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice accuracy for speed, in cases where we might be racy with a simultaneous repack. This is similar to the fix in 0eeb077 (index-pack: avoid excessive re-reading of pack directory, 2015-06-09). As with that case, it's OK for has_sha1_file() occasionally say "no I don't have it" when we do, because the worst case is not a corruption, but simply that we may fail to auto-follow a tag that points to it. Here are results from the included perf script, which sets up a situation similar to the one described above: Test HEAD^ HEAD ---------------------------------------------------------- 5550.4: fetch 11.21(10.42+0.78) 0.08(0.04+0.02) -99.3% Reported-by: Vegard Nossum <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 0202c41 commit 5827a03

File tree

4 files changed

+112
-4
lines changed

4 files changed

+112
-4
lines changed

builtin/fetch.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,9 +232,10 @@ static void find_non_local_tags(struct transport *transport,
232232
* as one to ignore by setting util to NULL.
233233
*/
234234
if (ends_with(ref->name, "^{}")) {
235-
if (item && !has_object_file(&ref->old_oid) &&
235+
if (item &&
236+
!has_object_file_with_flags(&ref->old_oid, HAS_SHA1_QUICK) &&
236237
!will_fetch(head, ref->old_oid.hash) &&
237-
!has_sha1_file(item->util) &&
238+
!has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
238239
!will_fetch(head, item->util))
239240
item->util = NULL;
240241
item = NULL;
@@ -247,7 +248,8 @@ static void find_non_local_tags(struct transport *transport,
247248
* to check if it is a lightweight tag that we want to
248249
* fetch.
249250
*/
250-
if (item && !has_sha1_file(item->util) &&
251+
if (item &&
252+
!has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
251253
!will_fetch(head, item->util))
252254
item->util = NULL;
253255

@@ -267,7 +269,8 @@ static void find_non_local_tags(struct transport *transport,
267269
* We may have a final lightweight tag that needs to be
268270
* checked to see if it needs fetching.
269271
*/
270-
if (item && !has_sha1_file(item->util) &&
272+
if (item &&
273+
!has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
271274
!will_fetch(head, item->util))
272275
item->util = NULL;
273276

cache.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1116,6 +1116,7 @@ static inline int has_sha1_file(const unsigned char *sha1)
11161116

11171117
/* Same as the above, except for struct object_id. */
11181118
extern int has_object_file(const struct object_id *oid);
1119+
extern int has_object_file_with_flags(const struct object_id *oid, int flags);
11191120

11201121
/*
11211122
* Return true iff an alternate object database has a loose object

sha1_file.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3239,6 +3239,11 @@ int has_object_file(const struct object_id *oid)
32393239
return has_sha1_file(oid->hash);
32403240
}
32413241

3242+
int has_object_file_with_flags(const struct object_id *oid, int flags)
3243+
{
3244+
return has_sha1_file_with_flags(oid->hash, flags);
3245+
}
3246+
32423247
static void check_tree(const void *buf, size_t size)
32433248
{
32443249
struct tree_desc desc;

t/perf/p5550-fetch-tags.sh

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
#!/bin/sh
2+
3+
test_description='performance of tag-following with many tags
4+
5+
This tests a fairly pathological case, so rather than rely on a real-world
6+
case, we will construct our own repository. The situation is roughly as
7+
follows.
8+
9+
The parent repository has a large number of tags which are disconnected from
10+
the rest of history. That makes them candidates for tag-following, but we never
11+
actually grab them (and thus they will impact each subsequent fetch).
12+
13+
The child repository is a clone of parent, without the tags, and is at least
14+
one commit behind the parent (meaning that we will fetch one object and then
15+
examine the tags to see if they need followed). Furthermore, it has a large
16+
number of packs.
17+
18+
The exact values of "large" here are somewhat arbitrary; I picked values that
19+
start to show a noticeable performance problem on my machine, but without
20+
taking too long to set up and run the tests.
21+
'
22+
. ./perf-lib.sh
23+
24+
# make a long nonsense history on branch $1, consisting of $2 commits, each
25+
# with a unique file pointing to the blob at $2.
26+
create_history () {
27+
perl -le '
28+
my ($branch, $n, $blob) = @ARGV;
29+
for (1..$n) {
30+
print "commit refs/heads/$branch";
31+
print "committer nobody <[email protected]> now";
32+
print "data 4";
33+
print "foo";
34+
print "M 100644 $blob $_";
35+
}
36+
' "$@" |
37+
git fast-import --date-format=now
38+
}
39+
40+
# make a series of tags, one per commit in the revision range given by $@
41+
create_tags () {
42+
git rev-list "$@" |
43+
perl -lne 'print "create refs/tags/$. $_"' |
44+
git update-ref --stdin
45+
}
46+
47+
# create $1 nonsense packs, each with a single blob
48+
create_packs () {
49+
perl -le '
50+
my ($n) = @ARGV;
51+
for (1..$n) {
52+
print "blob";
53+
print "data <<EOF";
54+
print "$_";
55+
print "EOF";
56+
}
57+
' "$@" |
58+
git fast-import &&
59+
60+
git cat-file --batch-all-objects --batch-check='%(objectname)' |
61+
while read sha1
62+
do
63+
echo $sha1 | git pack-objects .git/objects/pack/pack
64+
done
65+
}
66+
67+
test_expect_success 'create parent and child' '
68+
git init parent &&
69+
git -C parent commit --allow-empty -m base &&
70+
git clone parent child &&
71+
git -C parent commit --allow-empty -m trigger-fetch
72+
'
73+
74+
test_expect_success 'populate parent tags' '
75+
(
76+
cd parent &&
77+
blob=$(echo content | git hash-object -w --stdin) &&
78+
create_history cruft 3000 $blob &&
79+
create_tags cruft &&
80+
git branch -D cruft
81+
)
82+
'
83+
84+
test_expect_success 'create child packs' '
85+
(
86+
cd child &&
87+
git config gc.auto 0 &&
88+
git config gc.autopacklimit 0 &&
89+
create_packs 500
90+
)
91+
'
92+
93+
test_perf 'fetch' '
94+
# make sure there is something to fetch on each iteration
95+
git -C child update-ref -d refs/remotes/origin/master &&
96+
git -C child fetch
97+
'
98+
99+
test_done

0 commit comments

Comments
 (0)