Skip to content

Commit 7d59dde

Browse files
committed
Implement PCRE2_DISABLE_RECURSELOOP_CHECK
1 parent 037aecd commit 7d59dde

17 files changed

+196
-87
lines changed

ChangeLog

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,11 @@ substitution testing.
203203
56. Insert omitted setting of subject length in match data at the end of
204204
pcre2_jit_match().
205205

206+
57. Implemented PCRE2_DISABLE_RECURSELOOP_CHECK for pcre2_match() to enable
207+
some apparently looping recursions to run to completion and therefore match the
208+
JIT behaviour. With this set, real loops will eventually get caught by match or
209+
heap limits or run out of resource.
210+
206211

207212
Version 10.42 11-December-2022
208213
------------------------------

doc/html/pcre2_match.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ <h1>pcre2_match man page</h1>
6262
PCRE2_ANCHORED Match only at the first position
6363
PCRE2_COPY_MATCHED_SUBJECT
6464
On success, make a private subject copy
65+
PCRE2_DISABLE_RECURSELOOP_CHECK
66+
Only useful in rare cases; use with care
6567
PCRE2_ENDANCHORED Pattern can match only at end of subject
6668
PCRE2_NOTBOL Subject string is not the beginning of a line
6769
PCRE2_NOTEOL Subject string is not the end of a line

doc/html/pcre2api.html

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2820,14 +2820,16 @@ <h1>pcre2api man page</h1>
28202820
<P>
28212821
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
28222822
zero. The only bits that may be set are PCRE2_ANCHORED,
2823-
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
2824-
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
2825-
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
2823+
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_DISABLE_RECURSELOOP_CHECK, PCRE2_ENDANCHORED,
2824+
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
2825+
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
2826+
Their action is described below.
28262827
</P>
28272828
<P>
28282829
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
28292830
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
2830-
interpretive code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT
2831+
interpretive code in <b>pcre2_match()</b> is run.
2832+
PCRE2_DISABLE_RECURSELOOP_CHECK is ignored by JIT, but apart from PCRE2_NO_JIT
28312833
(obviously), the remaining options are supported for JIT matching.
28322834
<pre>
28332835
PCRE2_ANCHORED
@@ -2853,6 +2855,25 @@ <h1>pcre2api man page</h1>
28532855
<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
28542856
automatically freed if the match data block is re-used for another match
28552857
operation.
2858+
<pre>
2859+
PCRE2_DISABLE_RECURSELOOP_CHECK
2860+
</pre>
2861+
This option is relevant only to <b>pcre2_match()</b> for interpretive matching.
2862+
It is ignored when JIT is used, and is forbidden for <b>pcre2_dfa_match()</b>.
2863+
</P>
2864+
<P>
2865+
The use of recursion in patterns can lead to infinite loops. In the
2866+
interpretive matcher these would be eventually caught by the match or heap
2867+
limits, but this could take a long time and/or use a lot of memory if the
2868+
limits are large. There is therefore a check at the start of each recursion.
2869+
If the same group is still active from a previous call, and the current subject
2870+
pointer is the same as it was at the start of that group, and the furthest
2871+
inspected character of the subject has not changed, an error is generated.
2872+
</P>
2873+
<P>
2874+
There are rare cases of matches that would complete, but nevertheless trigger
2875+
this error. This option disables the check. It is provided mainly for testing
2876+
when comparing JIT and interpretive behaviour.
28562877
<pre>
28572878
PCRE2_ENDANCHORED
28582879
</pre>
@@ -4140,7 +4161,7 @@ <h1>pcre2api man page</h1>
41404161
</P>
41414162
<br><a name="SEC43" href="#TOC1">REVISION</a><br>
41424163
<P>
4143-
Last updated: 19 January 2024
4164+
Last updated: 27 January 2024
41444165
<br>
41454166
Copyright &copy; 1997-2024 University of Cambridge.
41464167
<br>

doc/html/pcre2test.html

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1209,18 +1209,19 @@ <h1>pcre2test man page</h1>
12091209
<a href="pcreapi.html"><b>pcreapi</b></a>
12101210
for a description of their effects.
12111211
<pre>
1212-
anchored set PCRE2_ANCHORED
1213-
endanchored set PCRE2_ENDANCHORED
1214-
dfa_restart set PCRE2_DFA_RESTART
1215-
dfa_shortest set PCRE2_DFA_SHORTEST
1216-
no_jit set PCRE2_NO_JIT
1217-
no_utf_check set PCRE2_NO_UTF_CHECK
1218-
notbol set PCRE2_NOTBOL
1219-
notempty set PCRE2_NOTEMPTY
1220-
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1221-
noteol set PCRE2_NOTEOL
1222-
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1223-
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
1212+
anchored set PCRE2_ANCHORED
1213+
endanchored set PCRE2_ENDANCHORED
1214+
dfa_restart set PCRE2_DFA_RESTART
1215+
dfa_shortest set PCRE2_DFA_SHORTEST
1216+
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK
1217+
no_jit set PCRE2_NO_JIT
1218+
no_utf_check set PCRE2_NO_UTF_CHECK
1219+
notbol set PCRE2_NOTBOL
1220+
notempty set PCRE2_NOTEMPTY
1221+
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1222+
noteol set PCRE2_NOTEOL
1223+
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1224+
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
12241225
</pre>
12251226
The partial matching modifiers are provided with abbreviations because they
12261227
appear frequently in tests.
@@ -2192,7 +2193,7 @@ <h1>pcre2test man page</h1>
21922193
</P>
21932194
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
21942195
<P>
2195-
Last updated: 19 January 2024
2196+
Last updated: 27 January 2024
21962197
<br>
21972198
Copyright &copy; 1997-2024 University of Cambridge.
21982199
<br>

doc/pcre2.txt

Lines changed: 44 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2752,42 +2752,62 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
27522752

27532753
The unused bits of the options argument for pcre2_match() must be zero.
27542754
The only bits that may be set are PCRE2_ANCHORED,
2755-
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO-
2756-
TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT,
2757-
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their
2758-
action is described below.
2755+
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_DISABLE_RECURSELOOP_CHECK, PCRE2_EN-
2756+
DANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
2757+
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PAR-
2758+
TIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
27592759

27602760
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not sup-
27612761
ported by the just-in-time (JIT) compiler. If it is set, JIT matching
2762-
is disabled and the interpretive code in pcre2_match() is run. Apart
2763-
from PCRE2_NO_JIT (obviously), the remaining options are supported for
2764-
JIT matching.
2762+
is disabled and the interpretive code in pcre2_match() is run.
2763+
PCRE2_DISABLE_RECURSELOOP_CHECK is ignored by JIT, but apart from
2764+
PCRE2_NO_JIT (obviously), the remaining options are supported for JIT
2765+
matching.
27652766

27662767
PCRE2_ANCHORED
27672768

27682769
The PCRE2_ANCHORED option limits pcre2_match() to matching at the first
2769-
matching position. If a pattern was compiled with PCRE2_ANCHORED, or
2770-
turned out to be anchored by virtue of its contents, it cannot be made
2771-
unachored at matching time. Note that setting the option at match time
2770+
matching position. If a pattern was compiled with PCRE2_ANCHORED, or
2771+
turned out to be anchored by virtue of its contents, it cannot be made
2772+
unachored at matching time. Note that setting the option at match time
27722773
disables JIT matching.
27732774

27742775
PCRE2_COPY_MATCHED_SUBJECT
27752776

2776-
By default, a pointer to the subject is remembered in the match data
2777-
block so that, after a successful match, it can be referenced by the
2778-
substring extraction functions. This means that the subject's memory
2779-
must not be freed until all such operations are complete. For some ap-
2780-
plications where the lifetime of the subject string is not guaranteed,
2781-
it may be necessary to make a copy of the subject string, but it is
2782-
wasteful to do this unless the match is successful. After a successful
2783-
match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is copied and
2784-
the new pointer is remembered in the match data block instead of the
2785-
original subject pointer. The memory allocator that was used for the
2786-
match block itself is used. The copy is automatically freed when
2787-
pcre2_match_data_free() is called to free the match data block. It is
2777+
By default, a pointer to the subject is remembered in the match data
2778+
block so that, after a successful match, it can be referenced by the
2779+
substring extraction functions. This means that the subject's memory
2780+
must not be freed until all such operations are complete. For some ap-
2781+
plications where the lifetime of the subject string is not guaranteed,
2782+
it may be necessary to make a copy of the subject string, but it is
2783+
wasteful to do this unless the match is successful. After a successful
2784+
match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is copied and
2785+
the new pointer is remembered in the match data block instead of the
2786+
original subject pointer. The memory allocator that was used for the
2787+
match block itself is used. The copy is automatically freed when
2788+
pcre2_match_data_free() is called to free the match data block. It is
27882789
also automatically freed if the match data block is re-used for another
27892790
match operation.
27902791

2792+
PCRE2_DISABLE_RECURSELOOP_CHECK
2793+
2794+
This option is relevant only to pcre2_match() for interpretive match-
2795+
ing. It is ignored when JIT is used, and is forbidden for
2796+
pcre2_dfa_match().
2797+
2798+
The use of recursion in patterns can lead to infinite loops. In the in-
2799+
terpretive matcher these would be eventually caught by the match or
2800+
heap limits, but this could take a long time and/or use a lot of memory
2801+
if the limits are large. There is therefore a check at the start of
2802+
each recursion. If the same group is still active from a previous
2803+
call, and the current subject pointer is the same as it was at the
2804+
start of that group, and the furthest inspected character of the sub-
2805+
ject has not changed, an error is generated.
2806+
2807+
There are rare cases of matches that would complete, but nevertheless
2808+
trigger this error. This option disables the check. It is provided
2809+
mainly for testing when comparing JIT and interpretive behaviour.
2810+
27912811
PCRE2_ENDANCHORED
27922812

27932813
If the PCRE2_ENDANCHORED option is set, any string that pcre2_match()
@@ -3978,11 +3998,11 @@ AUTHOR
39783998

39793999
REVISION
39804000

3981-
Last updated: 19 January 2024
4001+
Last updated: 27 January 2024
39824002
Copyright (c) 1997-2024 University of Cambridge.
39834003

39844004

3985-
PCRE2 10.43 19 January 2024 PCRE2API(3)
4005+
PCRE2 10.43 27 January 2024 PCRE2API(3)
39864006
------------------------------------------------------------------------------
39874007

39884008

doc/pcre2_match.3

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
1+
.TH PCRE2_MATCH 3 "27 January 2024" "PCRE2 10.43"
22
.SH NAME
33
PCRE2 - Perl-compatible regular expressions (revised API)
44
.SH SYNOPSIS
@@ -50,6 +50,8 @@ terminated by a binary zero code unit. The options are:
5050
PCRE2_ANCHORED Match only at the first position
5151
PCRE2_COPY_MATCHED_SUBJECT
5252
On success, make a private subject copy
53+
PCRE2_DISABLE_RECURSELOOP_CHECK
54+
Only useful in rare cases; use with care
5355
PCRE2_ENDANCHORED Pattern can match only at end of subject
5456
PCRE2_NOTBOL Subject string is not the beginning of a line
5557
PCRE2_NOTEOL Subject string is not the end of a line

doc/pcre2api.3

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.TH PCRE2API 3 "19 January 2024" "PCRE2 10.43"
1+
.TH PCRE2API 3 "27 January 2024" "PCRE2 10.43"
22
.SH NAME
33
PCRE2 - Perl-compatible regular expressions (revised API)
44
.sp
@@ -2804,13 +2804,15 @@ the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
28042804
.sp
28052805
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
28062806
zero. The only bits that may be set are PCRE2_ANCHORED,
2807-
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
2808-
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
2809-
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
2807+
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_DISABLE_RECURSELOOP_CHECK, PCRE2_ENDANCHORED,
2808+
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
2809+
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
2810+
Their action is described below.
28102811
.P
28112812
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
28122813
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
2813-
interpretive code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT
2814+
interpretive code in \fBpcre2_match()\fP is run.
2815+
PCRE2_DISABLE_RECURSELOOP_CHECK is ignored by JIT, but apart from PCRE2_NO_JIT
28142816
(obviously), the remaining options are supported for JIT matching.
28152817
.sp
28162818
PCRE2_ANCHORED
@@ -2836,6 +2838,23 @@ the match block itself is used. The copy is automatically freed when
28362838
\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
28372839
automatically freed if the match data block is re-used for another match
28382840
operation.
2841+
.sp
2842+
PCRE2_DISABLE_RECURSELOOP_CHECK
2843+
.sp
2844+
This option is relevant only to \fBpcre2_match()\fP for interpretive matching.
2845+
It is ignored when JIT is used, and is forbidden for \fBpcre2_dfa_match()\fP.
2846+
.P
2847+
The use of recursion in patterns can lead to infinite loops. In the
2848+
interpretive matcher these would be eventually caught by the match or heap
2849+
limits, but this could take a long time and/or use a lot of memory if the
2850+
limits are large. There is therefore a check at the start of each recursion.
2851+
If the same group is still active from a previous call, and the current subject
2852+
pointer is the same as it was at the start of that group, and the furthest
2853+
inspected character of the subject has not changed, an error is generated.
2854+
.P
2855+
There are rare cases of matches that would complete, but nevertheless trigger
2856+
this error. This option disables the check. It is provided mainly for testing
2857+
when comparing JIT and interpretive behaviour.
28392858
.sp
28402859
PCRE2_ENDANCHORED
28412860
.sp
@@ -4148,6 +4167,6 @@ Cambridge, England.
41484167
.rs
41494168
.sp
41504169
.nf
4151-
Last updated: 19 January 2024
4170+
Last updated: 27 January 2024
41524171
Copyright (c) 1997-2024 University of Cambridge.
41534172
.fi

doc/pcre2demo.3

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.TH PCRE2DEMO 3 "19 January 2024" "PCRE2 10.43-RC1"
1+
.TH PCRE2DEMO 3 "27 January 2024" "PCRE2 10.43-RC1"
22
.\"AUTOMATICALLY GENERATED BY PrepareRelease - do not EDIT!
33
.SH NAME
44
PCRE2DEMO - A demonstration C program for PCRE2

doc/pcre2test.1

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.TH PCRE2TEST 1 "19 January 2024" "PCRE 10.43"
1+
.TH PCRE2TEST 1 "27 January 2024" "PCRE 10.43"
22
.SH NAME
33
pcre2test - a program for testing Perl-compatible regular expressions.
44
.SH SYNOPSIS
@@ -1174,18 +1174,19 @@ The following modifiers set options for \fBpcre2_match()\fP or
11741174
.\"
11751175
for a description of their effects.
11761176
.sp
1177-
anchored set PCRE2_ANCHORED
1178-
endanchored set PCRE2_ENDANCHORED
1179-
dfa_restart set PCRE2_DFA_RESTART
1180-
dfa_shortest set PCRE2_DFA_SHORTEST
1181-
no_jit set PCRE2_NO_JIT
1182-
no_utf_check set PCRE2_NO_UTF_CHECK
1183-
notbol set PCRE2_NOTBOL
1184-
notempty set PCRE2_NOTEMPTY
1185-
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1186-
noteol set PCRE2_NOTEOL
1187-
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1188-
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
1177+
anchored set PCRE2_ANCHORED
1178+
endanchored set PCRE2_ENDANCHORED
1179+
dfa_restart set PCRE2_DFA_RESTART
1180+
dfa_shortest set PCRE2_DFA_SHORTEST
1181+
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK
1182+
no_jit set PCRE2_NO_JIT
1183+
no_utf_check set PCRE2_NO_UTF_CHECK
1184+
notbol set PCRE2_NOTBOL
1185+
notempty set PCRE2_NOTEMPTY
1186+
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1187+
noteol set PCRE2_NOTEOL
1188+
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1189+
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
11891190
.sp
11901191
The partial matching modifiers are provided with abbreviations because they
11911192
appear frequently in tests.
@@ -2169,6 +2170,6 @@ Cambridge, England.
21692170
.rs
21702171
.sp
21712172
.nf
2172-
Last updated: 19 January 2024
2173+
Last updated: 27 January 2024
21732174
Copyright (c) 1997-2024 University of Cambridge.
21742175
.fi

doc/pcre2test.txt

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1080,18 +1080,19 @@ SUBJECT MODIFIERS
10801080
The following modifiers set options for pcre2_match() or
10811081
pcre2_dfa_match(). See pcreapi for a description of their effects.
10821082

1083-
anchored set PCRE2_ANCHORED
1084-
endanchored set PCRE2_ENDANCHORED
1085-
dfa_restart set PCRE2_DFA_RESTART
1086-
dfa_shortest set PCRE2_DFA_SHORTEST
1087-
no_jit set PCRE2_NO_JIT
1088-
no_utf_check set PCRE2_NO_UTF_CHECK
1089-
notbol set PCRE2_NOTBOL
1090-
notempty set PCRE2_NOTEMPTY
1091-
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1092-
noteol set PCRE2_NOTEOL
1093-
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1094-
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
1083+
anchored set PCRE2_ANCHORED
1084+
endanchored set PCRE2_ENDANCHORED
1085+
dfa_restart set PCRE2_DFA_RESTART
1086+
dfa_shortest set PCRE2_DFA_SHORTEST
1087+
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK
1088+
no_jit set PCRE2_NO_JIT
1089+
no_utf_check set PCRE2_NO_UTF_CHECK
1090+
notbol set PCRE2_NOTBOL
1091+
notempty set PCRE2_NOTEMPTY
1092+
notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1093+
noteol set PCRE2_NOTEOL
1094+
partial_hard (or ph) set PCRE2_PARTIAL_HARD
1095+
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
10951096

10961097
The partial matching modifiers are provided with abbreviations because
10971098
they appear frequently in tests.
@@ -1997,8 +1998,8 @@ AUTHOR
19971998

19981999
REVISION
19992000

2000-
Last updated: 19 January 2024
2001+
Last updated: 27 January 2024
20012002
Copyright (c) 1997-2024 University of Cambridge.
20022003

20032004

2004-
PCRE 10.43 19 January 2024 PCRE2TEST(1)
2005+
PCRE 10.43 27 January 2024 PCRE2TEST(1)

0 commit comments

Comments
 (0)