Skip to content

Commit dc2c44f

Browse files
rscharfegitster
authored andcommitted
grep/pcre2: use PCRE2_UTF even with ASCII patterns
compile_pcre2_pattern() currently uses the option PCRE2_UTF only for patterns with non-ASCII characters. Patterns with ASCII wildcards can match non-ASCII strings, though. Without that option PCRE2 mishandles UTF-8 input, though -- it matches parts of multi-byte characters. Fix that by using PCRE2_UTF even for ASCII-only patterns. This is a remake of the reverted ae39ba4 (grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data, 2021-10-15). The change to the condition and the test are simplified and more targeted. Original-patch-by: Hamza Mahfooz <[email protected]> Signed-off-by: René Scharfe <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent e9d7761 commit dc2c44f

File tree

2 files changed

+7
-1
lines changed

2 files changed

+7
-1
lines changed

grep.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
382382
}
383383
options |= PCRE2_CASELESS;
384384
}
385-
if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
385+
if (!opt->ignore_locale && is_utf8_locale() &&
386386
!(!opt->ignore_case && (p->fixed || p->is_fixed)))
387387
options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
388388

t/t7812-grep-icase-non-ascii.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,4 +123,10 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE2,PCRE2_MATCH_INVALID_UTF 'PCRE v2: gr
123123
test_cmp invalid-0xe5 actual
124124
'
125125

126+
test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-literal ASCII from UTF-8' '
127+
git grep --perl-regexp -h -o -e ll. file >actual &&
128+
echo "lló" >expected &&
129+
test_cmp expected actual
130+
'
131+
126132
test_done

0 commit comments

Comments
 (0)