Skip to content

Commit 32a56df

Browse files
newrengitster
authored andcommitted
merge-ort: precompute subset of sources for which we need rename detection
rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0 ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 9799889 commit 32a56df

File tree

2 files changed

+103
-3
lines changed

2 files changed

+103
-3
lines changed

merge-ort.c

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,20 @@ struct rename_info {
8888
*/
8989
struct strmap dir_renames[3];
9090

91+
/*
92+
* relevant_sources: deleted paths for which we need rename detection
93+
*
94+
* relevant_sources is a set of deleted paths on each side of
95+
* history for which we need rename detection. If a path is deleted
96+
* on one side of history, we need to detect if it is part of a
97+
* rename if either
98+
* * we need to detect renames for an ancestor directory
99+
* * the file is modified/deleted on the other side of history
100+
* If neither of those are true, we can skip rename detection for
101+
* that path.
102+
*/
103+
struct strset relevant_sources[3];
104+
91105
/*
92106
* needed_limit: value needed for inexact rename detection to run
93107
*
@@ -358,6 +372,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
358372
strmap_clear(&renames->dir_rename_count[i], 1);
359373

360374
strmap_func(&renames->dir_renames[i], 0);
375+
376+
strset_func(&renames->relevant_sources[i]);
361377
}
362378

363379
if (!reinitialize) {
@@ -533,12 +549,21 @@ static void add_pair(struct merge_options *opt,
533549
struct name_entry *names,
534550
const char *pathname,
535551
unsigned side,
536-
unsigned is_add /* if false, is_delete */)
552+
unsigned is_add /* if false, is_delete */,
553+
unsigned match_mask)
537554
{
538555
struct diff_filespec *one, *two;
539556
struct rename_info *renames = &opt->priv->renames;
540557
int names_idx = is_add ? side : 0;
541558

559+
if (!is_add) {
560+
unsigned content_relevant = (match_mask == 0);
561+
unsigned location_relevant = 1; /* FIXME: compute this */
562+
563+
if (content_relevant || location_relevant)
564+
strset_add(&renames->relevant_sources[side], pathname);
565+
}
566+
542567
one = alloc_filespec(pathname);
543568
two = alloc_filespec(pathname);
544569
fill_filespec(is_add ? two : one,
@@ -575,11 +600,13 @@ static void collect_rename_info(struct merge_options *opt,
575600

576601
/* Check for deletion on side */
577602
if ((filemask & 1) && !(filemask & side_mask))
578-
add_pair(opt, names, fullname, side, 0 /* delete */);
603+
add_pair(opt, names, fullname, side, 0 /* delete */,
604+
match_mask & filemask);
579605

580606
/* Check for addition on side */
581607
if (!(filemask & 1) && (filemask & side_mask))
582-
add_pair(opt, names, fullname, side, 1 /* add */);
608+
add_pair(opt, names, fullname, side, 1 /* add */,
609+
match_mask & filemask);
583610
}
584611
}
585612

@@ -3228,6 +3255,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
32283255
NULL, 1);
32293256
strmap_init_with_options(&renames->dir_renames[i],
32303257
NULL, 0);
3258+
strset_init_with_options(&renames->relevant_sources[i],
3259+
NULL, 0);
32313260
}
32323261

32333262
/*

t/t6423-merge-rename-directories.sh

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4895,6 +4895,77 @@ test_expect_merge_algorithm failure success '12f: Trivial directory resolve, cac
48954895
)
48964896
'
48974897

4898+
# Testcase 12g, Testcase with two kinds of "relevant" renames
4899+
# Commit O: somefile_O, subdir/{a_O,b_O}
4900+
# Commit A: somefile_A, subdir/{a_O,b_O,c_A}
4901+
# Commit B: newfile_B, newdir/{a_B,b_B}
4902+
# Expected: newfile_{merged}, newdir/{a_B,b_B,c_A}
4903+
4904+
test_setup_12g () {
4905+
test_create_repo 12g &&
4906+
(
4907+
cd 12g &&
4908+
4909+
mkdir -p subdir &&
4910+
test_write_lines upon a time there was a >somefile &&
4911+
test_write_lines 1 2 3 4 5 6 7 8 9 10 >subdir/a &&
4912+
test_write_lines one two three four five six >subdir/b &&
4913+
git add . &&
4914+
test_tick &&
4915+
git commit -m "O" &&
4916+
4917+
git branch O &&
4918+
git branch A &&
4919+
git branch B &&
4920+
4921+
git switch A &&
4922+
test_write_lines once upon a time there was a >somefile &&
4923+
> subdir/c &&
4924+
git add somefile subdir/c &&
4925+
test_tick &&
4926+
git commit -m "A" &&
4927+
4928+
git checkout B &&
4929+
git mv somefile newfile &&
4930+
git mv subdir newdir &&
4931+
echo repo >>newfile &&
4932+
test_write_lines 1 2 3 4 5 6 7 8 9 10 11 >newdir/a &&
4933+
test_write_lines one two three four five six seven >newdir/b &&
4934+
git add newfile newdir &&
4935+
test_tick &&
4936+
git commit -m "B"
4937+
)
4938+
}
4939+
4940+
test_expect_success '12g: Testcase with two kinds of "relevant" renames' '
4941+
test_setup_12g &&
4942+
(
4943+
cd 12g &&
4944+
4945+
git checkout A^0 &&
4946+
4947+
git -c merge.directoryRenames=true merge -s recursive B^0 &&
4948+
4949+
test_write_lines once upon a time there was a repo >expect &&
4950+
test_cmp expect newfile &&
4951+
4952+
git ls-files -s >out &&
4953+
test_line_count = 4 out &&
4954+
4955+
git rev-parse >actual \
4956+
HEAD:newdir/a HEAD:newdir/b HEAD:newdir/c &&
4957+
git rev-parse >expect \
4958+
B:newdir/a B:newdir/b A:subdir/c &&
4959+
test_cmp expect actual &&
4960+
4961+
test_must_fail git rev-parse HEAD:subdir/a &&
4962+
test_must_fail git rev-parse HEAD:subdir/b &&
4963+
test_must_fail git rev-parse HEAD:subdir/c &&
4964+
test_path_is_missing subdir/ &&
4965+
test_path_is_file newdir/c
4966+
)
4967+
'
4968+
48984969
###########################################################################
48994970
# SECTION 13: Checking informational and conflict messages
49004971
#

0 commit comments

Comments
 (0)