Skip to content

Commit 8320b1d

Browse files
peffgitster
authored andcommitted
revision: use a prio_queue to hold rewritten parents
This patch fixes a quadratic list insertion in rewrite_one() when pathspec limiting is combined with --parents. What happens is something like this: 1. We see that some commit X touches the path, so we try to rewrite its parents. 2. rewrite_one() loops forever, rewriting parents, until it finds a relevant parent (or hits the root and decides there are none). The heavy lifting is done by process_parent(), which uses try_to_simplify_commit() to drop parents. 3. process_parent() puts any intermediate parents into the &revs->commits list, inserting by commit date as usual. So if commit X is recent, and then there's a large chunk of history that doesn't touch the path, we may add a lot of commits to &revs->commits. And insertion by commit date is O(n) in the worst case, making the whole thing quadratic. We tried to deal with this long ago in fce87ae (Fix quadratic performance in rewrite_one., 2008-07-12). In that scheme, we cache the oldest commit in the list; if the new commit to be added is older, we can start our linear traversal there. This often works well in practice because parents are older than their descendants, and thus we tend to add older and older commits as we traverse. But this isn't guaranteed, and in fact there's a simple case where it is not: merges. Imagine we look at the first parent of a merge and see a very old commit (let's say 3 years old). And on the second parent, as we go back 3 years in history, we might have many commits. That one first-parent commit has polluted our oldest-commit cache; it will remain the oldest while we traverse a huge chunk of history, during which we have to fall back to the slow, linear method of adding to the list. Naively, one might imagine that instead of caching the oldest commit, we'd start at the last-added one. But that just makes some cases faster while making others slower (and indeed, while it made a real-world test case much faster, it does quite poorly in the perf test include here). Fundamentally, these are just heuristics; our worst case is still quadratic, and some cases will approach that. Instead, let's use a data structure with better worst-case performance. Swapping out revs->commits for something else would have repercussions all over the code base, but we can take advantage of one fact: for the rewrite_one() case, nobody actually needs to see those commits in revs->commits until we've finished generating the whole list. That leaves us with two obvious options: 1. We can generate the list _unordered_, which should be O(n), and then sort it afterwards, which would be O(n log n) total. This is "sort-after" below. 2. We can insert the commits into a separate data structure, like a priority queue. This is "prio-queue" below. I expected that sort-after would be the fastest (since it saves us the extra step of copying the items into the linked list), but surprisingly the prio-queue seems to be a bit faster. Here are timings for the new p0001.6 for all three techniques across a few repositories, as compared to master: master cache-last sort-after prio-queue -------------------------------------------------------------------------------------------- GIT_PERF_REPO=git.git 0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8% GIT_PERF_REPO=linux.git 20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6% GIT_PERF_REPO=llvm-project.git 83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4% A few items to note: - the cache-list tweak does improve the bad case for llvm-project.git that started my digging into this problem. But it performs terribly on linux.git, barely helping at all. - the sort-after and prio-queue techniques work well. They approach the timing for running without --parents at all, which is what you'd expect (see below for more data). - prio-queue just barely outperforms sort-after. As I said, I'm not really sure why this is the case, but it is. You can see it even more prominently in this real-world case on llvm-project.git: git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll where prio-queue routinely outperforms sort-after by about 7%. One guess is that the prio-queue may just be more efficient because it uses a compact array. There are three new perf tests: - "rev-list --parents" gives us a baseline for running with --parents. This isn't sped up meaningfully here, because the bad case is triggered only with simplification. But it's good to make sure we don't screw it up (now, or in the future). - "rev-list -- dummy" gives us a baseline for just traversing with pathspec limiting. This gives a lower bound for the next test (and it's also a good thing for us to be checking in general for regressions, since we don't seem to have any existing tests). - "rev-list --parents -- dummy" shows off the problem (and our fix) Here are the timings for those three on llvm-project.git, before and after the fix: Test master prio-queue ------------------------------------------------------------------------------ 0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9% 0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0% 0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4% Changes in the first two are basically noise, and you can see we approach our lower bound in the final one. Note that we can't fully get rid of the list argument from process_parents(). Other callers do have lists, and it would be hard to convert them. They also don't seem to have this problem (probably because they actually remove items from the list as they loop, meaning it doesn't grow so large in the first place). So this basically just drops the "cache_ptr" parameter (which was used only by the one caller we're fixing here) and replaces it with a prio_queue. Callers are free to use either data structure, depending on what they're prepared to handle. Reported-by: Björn Pettersson A <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent aeb582a commit 8320b1d

File tree

2 files changed

+54
-22
lines changed

2 files changed

+54
-22
lines changed

revision.c

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -911,26 +911,11 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
911911
commit->object.flags |= TREESAME;
912912
}
913913

914-
static void commit_list_insert_by_date_cached(struct commit *p, struct commit_list **head,
915-
struct commit_list *cached_base, struct commit_list **cache)
916-
{
917-
struct commit_list *new_entry;
918-
919-
if (cached_base && p->date < cached_base->item->date)
920-
new_entry = commit_list_insert_by_date(p, &cached_base->next);
921-
else
922-
new_entry = commit_list_insert_by_date(p, head);
923-
924-
if (cache && (!*cache || p->date < (*cache)->item->date))
925-
*cache = new_entry;
926-
}
927-
928914
static int process_parents(struct rev_info *revs, struct commit *commit,
929-
struct commit_list **list, struct commit_list **cache_ptr)
915+
struct commit_list **list, struct prio_queue *queue)
930916
{
931917
struct commit_list *parent = commit->parents;
932918
unsigned left_flag;
933-
struct commit_list *cached_base = cache_ptr ? *cache_ptr : NULL;
934919

935920
if (commit->object.flags & ADDED)
936921
return 0;
@@ -966,7 +951,9 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
966951
continue;
967952
p->object.flags |= SEEN;
968953
if (list)
969-
commit_list_insert_by_date_cached(p, list, cached_base, cache_ptr);
954+
commit_list_insert_by_date(p, list);
955+
if (queue)
956+
prio_queue_put(queue, p);
970957
}
971958
return 0;
972959
}
@@ -1006,7 +993,9 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
1006993
if (!(p->object.flags & SEEN)) {
1007994
p->object.flags |= SEEN;
1008995
if (list)
1009-
commit_list_insert_by_date_cached(p, list, cached_base, cache_ptr);
996+
commit_list_insert_by_date(p, list);
997+
if (queue)
998+
prio_queue_put(queue, p);
1010999
}
10111000
if (revs->first_parent_only)
10121001
break;
@@ -3335,14 +3324,14 @@ int prepare_revision_walk(struct rev_info *revs)
33353324
return 0;
33363325
}
33373326

3338-
static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp)
3327+
static enum rewrite_result rewrite_one_1(struct rev_info *revs,
3328+
struct commit **pp,
3329+
struct prio_queue *queue)
33393330
{
3340-
struct commit_list *cache = NULL;
3341-
33423331
for (;;) {
33433332
struct commit *p = *pp;
33443333
if (!revs->limited)
3345-
if (process_parents(revs, p, &revs->commits, &cache) < 0)
3334+
if (process_parents(revs, p, NULL, queue) < 0)
33463335
return rewrite_one_error;
33473336
if (p->object.flags & UNINTERESTING)
33483337
return rewrite_one_ok;
@@ -3356,6 +3345,31 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
33563345
}
33573346
}
33583347

3348+
static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list)
3349+
{
3350+
while (q->nr) {
3351+
struct commit *item = prio_queue_peek(q);
3352+
struct commit_list *p = *list;
3353+
3354+
if (p && p->item->date >= item->date)
3355+
list = &p->next;
3356+
else {
3357+
p = commit_list_insert(item, list);
3358+
list = &p->next; /* skip newly added item */
3359+
prio_queue_get(q); /* pop item */
3360+
}
3361+
}
3362+
}
3363+
3364+
static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp)
3365+
{
3366+
struct prio_queue queue = { compare_commits_by_commit_date };
3367+
enum rewrite_result ret = rewrite_one_1(revs, pp, &queue);
3368+
merge_queue_into_list(&queue, &revs->commits);
3369+
clear_prio_queue(&queue);
3370+
return ret;
3371+
}
3372+
33593373
int rewrite_parents(struct rev_info *revs, struct commit *commit,
33603374
rewrite_parent_fn_t rewrite_parent)
33613375
{

t/perf/p0001-rev-list.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,24 @@ test_perf 'rev-list --all --objects' '
1414
git rev-list --all --objects >/dev/null
1515
'
1616

17+
test_perf 'rev-list --parents' '
18+
git rev-list --parents HEAD >/dev/null
19+
'
20+
21+
test_expect_success 'create dummy file' '
22+
echo unlikely-to-already-be-there >dummy &&
23+
git add dummy &&
24+
git commit -m dummy
25+
'
26+
27+
test_perf 'rev-list -- dummy' '
28+
git rev-list HEAD -- dummy
29+
'
30+
31+
test_perf 'rev-list --parents -- dummy' '
32+
git rev-list --parents HEAD -- dummy
33+
'
34+
1735
test_expect_success 'create new unreferenced commit' '
1836
commit=$(git commit-tree HEAD^{tree} -p HEAD) &&
1937
test_export commit

0 commit comments

Comments
 (0)