Skip to content

Commit 0324e8f

Browse files
phillipwoodgitster
authored andcommitted
word diff: handle zero length matches
If find_word_boundaries() encounters a zero length match (which can be caused by matching a newline or using '*' instead of '+' in the regex) we stop splitting the input into words which generates an inaccurate diff. To fix this increment the start point when there is a zero length match and try a new match. This is safe as posix regular expressions always return the longest available match so a zero length match means there are no longer matches available from the current position. Commit bf82940 (color-words: enable REG_NEWLINE to help user, 2009-01-17) prevented matching newlines in negated character classes but it is still possible for the user to have an explicit newline match in the regex which could cause a zero length match. One could argue that having explicit newline matches or using '*' rather than '+' are user errors but it seems to be better to work round them than produce inaccurate diffs. Signed-off-by: Phillip Wood <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 48bf2fa commit 0324e8f

File tree

2 files changed

+12
-3
lines changed

2 files changed

+12
-3
lines changed

diff.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2053,17 +2053,21 @@ static void fn_out_diff_words_aux(void *priv,
20532053
static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
20542054
int *begin, int *end)
20552055
{
2056-
if (word_regex && *begin < buffer->size) {
2056+
while (word_regex && *begin < buffer->size) {
20572057
regmatch_t match[1];
20582058
if (!regexec_buf(word_regex, buffer->ptr + *begin,
20592059
buffer->size - *begin, 1, match, 0)) {
20602060
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
20612061
'\n', match[0].rm_eo - match[0].rm_so);
20622062
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
20632063
*begin += match[0].rm_so;
2064-
return *begin >= *end;
2064+
if (*begin == *end)
2065+
(*begin)++;
2066+
else
2067+
return *begin > *end;
2068+
} else {
2069+
return -1;
20652070
}
2066-
return -1;
20672071
}
20682072

20692073
/* find the next word */

t/t4034-diff-words.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,11 @@ test_expect_success 'word diff with a regular expression' '
184184
word_diff --color-words="[a-z]+"
185185
'
186186

187+
test_expect_success 'word diff with zero length matches' '
188+
cp expect.letter-runs-are-words expect &&
189+
word_diff --color-words="[a-z${LF}]*"
190+
'
191+
187192
test_expect_success 'set up a diff driver' '
188193
git config diff.testdriver.wordRegex "[^[:space:]]" &&
189194
cat <<-\EOF >.gitattributes

0 commit comments

Comments
 (0)