Skip to content

Commit 85c8e36

Browse files
committed
filter-repo: accelerate is_ancestor() for --analyze mode
The --analyze mode was extremely slow for the freebsd/freebsd repo on github; digging in, the is_ancestor() function was being called a huge number of times -- about 22 times per commit on average (and about 17 million times overall). The analyze mode uses is_ancestor() to determine whether a rename equivalency class should be broken (i.e. renaming A->B mean all versions of A and B are just different versions of the same file, but if someone adds a new A in some commit which contains the A->B rename in its history then this equivalence class no longer holds). Each is_ancestor() call potentially has to walk a tree of dependencies all the way back to a sufficient depth where it can realize that the commit cannot be an ancestor; this can be a very long walk. We can speed this up by keeping track of some previous is_ancestor() results. If commit F is not an ancestor of commit G, then F cannot be an ancestor of children of G (unless that child has multiple parents; but even in that case F can only be an ancestor through one of the parents other than G). Similarly, if F is an ancestor of commit G, then F will always be an ancestor of any children of G. Cache results from previous calls to is_ancestor() and use them to accelerate subsequent calls. Signed-off-by: Elijah Newren <[email protected]>
1 parent f2dccbc commit 85c8e36

File tree

2 files changed

+36
-6
lines changed

2 files changed

+36
-6
lines changed

git-filter-repo

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,9 @@ class AncestryGraph(object):
210210
# than the max depth of any of its ancestors.
211211
self.graph = {}
212212

213+
# Cached results from previous calls to is_ancestor().
214+
self._cached_is_ancestor = {}
215+
213216
def record_external_commits(self, external_commits):
214217
"""
215218
Record in graph that each commit in external_commits exists, and is
@@ -245,20 +248,29 @@ class AncestryGraph(object):
245248
Return whether possible_ancestor is an ancestor of check
246249
"""
247250
a, b = self.value[possible_ancestor], self.value[check]
251+
original_pair = (a,b)
248252
a_depth = self.graph[a][0]
249253
ancestors = [b]
250254
visited = set()
251255
while ancestors:
252256
ancestor = ancestors.pop()
257+
prev_pair = (a, ancestor)
258+
if prev_pair in self._cached_is_ancestor:
259+
if not self._cached_is_ancestor[prev_pair]:
260+
continue
261+
self._cached_is_ancestor[original_pair] = True
262+
return True
253263
if ancestor in visited:
254264
continue
255265
visited.add(ancestor)
256266
depth, more_ancestors = self.graph[ancestor]
257267
if ancestor == a:
268+
self._cached_is_ancestor[original_pair] = True
258269
return True
259270
elif depth <= a_depth:
260271
continue
261272
ancestors.extend(more_ancestors)
273+
self._cached_is_ancestor[original_pair] = False
262274
return False
263275

264276
class MailmapInfo(object):

t/t9390-filter-repo.sh

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,7 @@ test_expect_success 'setup analyze_me' '
529529
test_tick &&
530530
git commit -m initial &&
531531
532+
git branch modify-fickle &&
532533
git branch other &&
533534
git mv fickle capricious &&
534535
test_tick &&
@@ -568,6 +569,21 @@ test_expect_success 'setup analyze_me' '
568569
test_tick &&
569570
git commit --allow-empty -m "Final commit, empty" &&
570571
572+
git checkout modify-fickle &&
573+
echo "more stuff" >>fickle &&
574+
test_tick &&
575+
git commit -am "another more stuff commit" &&
576+
577+
git checkout modify-fickle &&
578+
echo "more stuff" >>fickle &&
579+
test_tick &&
580+
git commit -am "another more stuff commit" &&
581+
582+
test_tick &&
583+
git commit --allow-empty -m "Final commit, empty" &&
584+
585+
git checkout master &&
586+
571587
# Add a random extra unreferenced object
572588
echo foobar | git hash-object --stdin -w
573589
)
@@ -596,13 +612,13 @@ test_expect_success C_LOCALE_OUTPUT '--analyze' '
596612
597613
cat >expect <<-EOF &&
598614
== Overall Statistics ==
599-
Number of commits: 9
615+
Number of commits: 12
600616
Number of filenames: 10
601617
Number of directories: 4
602618
Number of file extensions: 2
603619
604-
Total unpacked size (bytes): 147
605-
Total packed size (bytes): 306
620+
Total unpacked size (bytes): 206
621+
Total packed size (bytes): 387
606622
607623
EOF
608624
head -n 9 README >actual &&
@@ -612,6 +628,8 @@ test_expect_success C_LOCALE_OUTPUT '--analyze' '
612628
=== Files by sha and associated pathnames in reverse size ===
613629
Format: sha, unpacked size, packed size, filename(s) object stored as
614630
a89c82a2d4b713a125a4323d25adda062cc0013d 44 48 numbers/medium.num
631+
c58ae2ffaf8352bd9860bf4bbb6ea78238dca846 35 41 fickle
632+
ccff62141ec7bae42e01a3dcb7615b38aa9fa5b3 24 40 fickle
615633
f00c965d8307308469e537302baa73048488f162 21 37 numbers/small.num
616634
2aa69a2a708eed00cb390e30f6bcc3eed773f390 20 36 whatever
617635
51b95456de9274c9a95f756742808dfd480b9b35 13 29 [capricious, fickle, mercurial]
@@ -624,7 +642,7 @@ test_expect_success C_LOCALE_OUTPUT '--analyze' '
624642
cat >expect <<-EOF &&
625643
=== All directories by reverse size ===
626644
Format: unpacked size, packed size, date deleted, directory name
627-
147 306 <present> <toplevel>
645+
206 387 <present> <toplevel>
628646
65 85 2005-04-07 numbers
629647
13 58 <present> words
630648
10 40 <present> sequence
@@ -641,7 +659,7 @@ test_expect_success C_LOCALE_OUTPUT '--analyze' '
641659
cat >expect <<-EOF &&
642660
=== All extensions by reverse size ===
643661
Format: unpacked size, packed size, date deleted, extension name
644-
82 221 <present> <no extension>
662+
141 302 <present> <no extension>
645663
65 85 2005-04-07 .num
646664
EOF
647665
test_cmp expect extensions-all-sizes.txt &&
@@ -656,12 +674,12 @@ test_expect_success C_LOCALE_OUTPUT '--analyze' '
656674
cat >expect <<-EOF &&
657675
=== All paths by reverse accumulated size ===
658676
Format: unpacked size, packed size, date deleted, path name
677+
72 110 <present> fickle
659678
44 48 2005-04-07 numbers/medium.num
660679
8 38 <present> words/know
661680
21 37 2005-04-07 numbers/small.num
662681
20 36 <present> whatever
663682
13 29 <present> mercurial
664-
13 29 <present> fickle
665683
13 29 <present> capricious
666684
5 20 <present> words/to
667685
5 20 <present> sequence/to

0 commit comments

Comments
 (0)