Skip to content

Commit afce00e

Browse files
committed
Fix compile loop in 32-bit mode for characters above the Unicode limit when caseless and ucp are set.
1 parent 0820852 commit afce00e

File tree

5 files changed

+27
-5
lines changed

5 files changed

+27
-5
lines changed

ChangeLog

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -174,10 +174,14 @@ undefined behaviour.
174174
that its end is handled similarly to other recursions. This has altered the
175175
behaviour of /|(?0)./endanchored which was previously not right.
176176

177-
48. Improved the test for looping recursion by checking the last referenced
178-
character as well as the current character. This allows some patterns that
179-
previously triggered the check to run to completion instead of giving the loop
180-
error.
177+
48. Improved the test for looping recursion by checking the last referenced
178+
character as well as the current character. This allows some patterns that
179+
previously triggered the check to run to completion instead of giving the loop
180+
error.
181+
182+
49. In 32-bit mode, the compiler looped for the pattern /[\x{ffffffff}]/ when
183+
PCRE2_CASELESS and PCRE2_UCP (but not PCRE2_UTF) were set. Fixed by not trying
184+
to look for other cases for characters above the Unicode range.
181185

182186

183187
Version 10.42 11-December-2022

src/pcre2_compile.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5155,10 +5155,14 @@ unsigned int co;
51555155

51565156
/* Find the first character that has an other case. If it has multiple other
51575157
cases, return its case offset value. When CASELESS_RESTRICT is set, ignore the
5158-
multi-case entries that begin with ASCII values. */
5158+
multi-case entries that begin with ASCII values. In 32-bit mode, a value
5159+
greater than the Unicode maximum ends the range. */
51595160

51605161
for (c = *cptr; c <= d; c++)
51615162
{
5163+
#if PCRE2_CODE_UNIT_WIDTH == 32
5164+
if (c > MAX_UTF_CODE_POINT) return -1;
5165+
#endif
51625166
if ((co = UCD_CASESET(c)) != 0 &&
51635167
(!restricted || PRIV(ucd_caseless_sets)[co] > 127))
51645168
{

testdata/testinput12

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -573,4 +573,8 @@
573573
/\X++/
574574
a\x{110000}\x{ffffffff}
575575

576+
# This used to loop in 32-bit mode; it will fail in 16-bit mode.
577+
/[\x{ffffffff}]/caseless,ucp
578+
\x{ffffffff}xyz
579+
576580
# End of testinput12

testdata/testoutput12-16

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1823,4 +1823,9 @@ Failed: error 134 at offset 11: character code point value in \x{} or \o{} is to
18231823
** Truncation will probably give the wrong result.
18241824
0: a\x00\x{ffff}
18251825

1826+
# This used to loop in 32-bit mode; it will fail in 16-bit mode.
1827+
/[\x{ffffffff}]/caseless,ucp
1828+
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
1829+
\x{ffffffff}xyz
1830+
18261831
# End of testinput12

testdata/testoutput12-32

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1817,4 +1817,9 @@ No match
18171817
a\x{110000}\x{ffffffff}
18181818
0: a\x{110000}\x{ffffffff}
18191819

1820+
# This used to loop in 32-bit mode; it will fail in 16-bit mode.
1821+
/[\x{ffffffff}]/caseless,ucp
1822+
\x{ffffffff}xyz
1823+
0: \x{ffffffff}
1824+
18201825
# End of testinput12

0 commit comments

Comments
 (0)