Skip to content

Commit d12a8cf

Browse files
jeffhostetlergitster
authored andcommitted
unpack-trees: avoid duplicate ODB lookups during checkout
Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree and checkout, there will likely be many peer directories that have the same OID when the differences between the commits are relatively small. In these cases we can avoid hitting the ODB multiple times for the same OID. This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor(). ================ On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 =============== ================ On Linux using p0006-read-tree-checkout.sh with linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------------------------------------- 0006.2: read-tree br_base br_ballast (57994) 0.24(0.20+0.03) 0.24(0.22+0.01) +0.0% 0006.3: switch between br_base br_ballast (57994) 10.58(6.23+2.86) 10.67(5.94+2.87) +0.9% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.60(0.44+0.17) 0.57(0.44+0.14) -5.0% 0006.5: switch between aliases (57994) 0.59(0.48+0.13) 0.57(0.44+0.15) -3.4% ================ Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 49800c9 commit d12a8cf

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

unpack-trees.c

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -531,12 +531,18 @@ static int switch_cache_bottom(struct traverse_info *info)
531531
return ret;
532532
}
533533

534+
static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
535+
{
536+
return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
537+
}
538+
534539
static int traverse_trees_recursive(int n, unsigned long dirmask,
535540
unsigned long df_conflicts,
536541
struct name_entry *names,
537542
struct traverse_info *info)
538543
{
539544
int i, ret, bottom;
545+
int nr_buf = 0;
540546
struct tree_desc t[MAX_UNPACK_TREES];
541547
void *buf[MAX_UNPACK_TREES];
542548
struct traverse_info newinfo;
@@ -553,18 +559,40 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
553559
newinfo.pathlen += tree_entry_len(p) + 1;
554560
newinfo.df_conflicts |= df_conflicts;
555561

562+
/*
563+
* Fetch the tree from the ODB for each peer directory in the
564+
* n commits.
565+
*
566+
* For 2- and 3-way traversals, we try to avoid hitting the
567+
* ODB twice for the same OID. This should yield a nice speed
568+
* up in checkouts and merges when the commits are similar.
569+
*
570+
* We don't bother doing the full O(n^2) search for larger n,
571+
* because wider traversals don't happen that often and we
572+
* avoid the search setup.
573+
*
574+
* When 2 peer OIDs are the same, we just copy the tree
575+
* descriptor data. This implicitly borrows the buffer
576+
* data from the earlier cell.
577+
*/
556578
for (i = 0; i < n; i++, dirmask >>= 1) {
557-
const unsigned char *sha1 = NULL;
558-
if (dirmask & 1)
559-
sha1 = names[i].oid->hash;
560-
buf[i] = fill_tree_descriptor(t+i, sha1);
579+
if (i > 0 && are_same_oid(&names[i], &names[i - 1]))
580+
t[i] = t[i - 1];
581+
else if (i > 1 && are_same_oid(&names[i], &names[i - 2]))
582+
t[i] = t[i - 2];
583+
else {
584+
const unsigned char *sha1 = NULL;
585+
if (dirmask & 1)
586+
sha1 = names[i].oid->hash;
587+
buf[nr_buf++] = fill_tree_descriptor(t+i, sha1);
588+
}
561589
}
562590

563591
bottom = switch_cache_bottom(&newinfo);
564592
ret = traverse_trees(n, t, &newinfo);
565593
restore_cache_bottom(&newinfo, bottom);
566594

567-
for (i = 0; i < n; i++)
595+
for (i = 0; i < nr_buf; i++)
568596
free(buf[i]);
569597

570598
return ret;

0 commit comments

Comments
 (0)