Skip to content

Commit 79bafd2

Browse files
committed
Merge branch 'jk/fewer-pack-rescan'
Internaly we use 0{40} as a placeholder object name to signal the codepath that there is no such object (e.g. the fast-forward check while "git fetch" stores a new remote-tracking ref says "we know there is no 'old' thing pointed at by the ref, as we are creating it anew" by passing 0{40} for the 'old' side), and expect that a codepath to locate an in-core object to return NULL as a sign that the object does not exist. A look-up for an object that does not exist however is quite costly with a repository with large number of packfiles. This access pattern has been optimized. * jk/fewer-pack-rescan: sha1_file: fast-path null sha1 as a missing object everything_local: use "quick" object existence check p5551: add a script to test fetch pack-dir rescans t/perf/lib-pack: use fast-import checkpoint to create packs p5550: factor out nonsense-pack creation
2 parents 4ca10aa + 87b5e23 commit 79bafd2

File tree

5 files changed

+87
-24
lines changed

5 files changed

+87
-24
lines changed

fetch-pack.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -716,7 +716,8 @@ static int everything_local(struct fetch_pack_args *args,
716716
for (ref = *refs; ref; ref = ref->next) {
717717
struct object *o;
718718

719-
if (!has_object_file(&ref->old_oid))
719+
if (!has_object_file_with_flags(&ref->old_oid,
720+
OBJECT_INFO_QUICK))
720721
continue;
721722

722723
o = parse_object(&ref->old_oid);

sha1_file.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,6 +1164,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
11641164
lookup_replace_object(sha1) :
11651165
sha1;
11661166

1167+
if (is_null_sha1(real))
1168+
return -1;
1169+
11671170
if (!oi)
11681171
oi = &blank_oi;
11691172

t/perf/lib-pack.sh

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Helpers for dealing with large numbers of packs.
2+
3+
# create $1 nonsense packs, each with a single blob
4+
create_packs () {
5+
perl -le '
6+
my ($n) = @ARGV;
7+
for (1..$n) {
8+
print "blob";
9+
print "data <<EOF";
10+
print "$_";
11+
print "EOF";
12+
print "checkpoint"
13+
}
14+
' "$@" |
15+
git fast-import
16+
}
17+
18+
# create a large number of packs, disabling any gc which might
19+
# cause us to repack them
20+
setup_many_packs () {
21+
git config gc.auto 0 &&
22+
git config gc.autopacklimit 0 &&
23+
git config fastimport.unpacklimit 0 &&
24+
create_packs 500
25+
}

t/perf/p5550-fetch-tags.sh

Lines changed: 2 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ start to show a noticeable performance problem on my machine, but without
2020
taking too long to set up and run the tests.
2121
'
2222
. ./perf-lib.sh
23+
. "$TEST_DIRECTORY/perf/lib-pack.sh"
2324

2425
# make a long nonsense history on branch $1, consisting of $2 commits, each
2526
# with a unique file pointing to the blob at $2.
@@ -44,26 +45,6 @@ create_tags () {
4445
git update-ref --stdin
4546
}
4647

47-
# create $1 nonsense packs, each with a single blob
48-
create_packs () {
49-
perl -le '
50-
my ($n) = @ARGV;
51-
for (1..$n) {
52-
print "blob";
53-
print "data <<EOF";
54-
print "$_";
55-
print "EOF";
56-
}
57-
' "$@" |
58-
git fast-import &&
59-
60-
git cat-file --batch-all-objects --batch-check='%(objectname)' |
61-
while read sha1
62-
do
63-
echo $sha1 | git pack-objects .git/objects/pack/pack
64-
done
65-
}
66-
6748
test_expect_success 'create parent and child' '
6849
git init parent &&
6950
git -C parent commit --allow-empty -m base &&
@@ -84,9 +65,7 @@ test_expect_success 'populate parent tags' '
8465
test_expect_success 'create child packs' '
8566
(
8667
cd child &&
87-
git config gc.auto 0 &&
88-
git config gc.autopacklimit 0 &&
89-
create_packs 500
68+
setup_many_packs
9069
)
9170
'
9271

t/perf/p5551-fetch-rescan.sh

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
#!/bin/sh
2+
3+
test_description='fetch performance with many packs
4+
5+
It is common for fetch to consider objects that we might not have, and it is an
6+
easy mistake for the code to use a function like `parse_object` that might
7+
give the correct _answer_ on such an object, but do so slowly (due to
8+
re-scanning the pack directory for lookup failures).
9+
10+
The resulting performance drop can be hard to notice in a real repository, but
11+
becomes quite large in a repository with a large number of packs. So this
12+
test creates a more pathological case, since any mistakes would produce a more
13+
noticeable slowdown.
14+
'
15+
. ./perf-lib.sh
16+
. "$TEST_DIRECTORY"/perf/lib-pack.sh
17+
18+
test_expect_success 'create parent and child' '
19+
git init parent &&
20+
git clone parent child
21+
'
22+
23+
24+
test_expect_success 'create refs in the parent' '
25+
(
26+
cd parent &&
27+
git commit --allow-empty -m foo &&
28+
head=$(git rev-parse HEAD) &&
29+
test_seq 1000 |
30+
sed "s,.*,update refs/heads/& $head," |
31+
$MODERN_GIT update-ref --stdin
32+
)
33+
'
34+
35+
test_expect_success 'create many packs in the child' '
36+
(
37+
cd child &&
38+
setup_many_packs
39+
)
40+
'
41+
42+
test_perf 'fetch' '
43+
# start at the same state for each iteration
44+
obj=$($MODERN_GIT -C parent rev-parse HEAD) &&
45+
(
46+
cd child &&
47+
$MODERN_GIT for-each-ref --format="delete %(refname)" refs/remotes |
48+
$MODERN_GIT update-ref --stdin &&
49+
rm -vf .git/objects/$(echo $obj | sed "s|^..|&/|") &&
50+
51+
git fetch
52+
)
53+
'
54+
55+
test_done

0 commit comments

Comments
 (0)