Skip to content

Commit ed69a3a

Browse files
NWilsonIsaacOscar
andauthored
Add pcre2_substitute checks to enforce pattern, subject, offset and options haven't changed (#807)
* Check for pattern/subject/offset/option changes when using PCRE2_SUBSTITUTE_MATCHED. * Return PCRE2_ERROR_DFA_UFUNC if using PCRE2_SUBSTITUTE_MATCHED after a call to pcre2_dfa_match(). * Add new error codes to pcre2_substitute when using PCRE2_SUBSTITUTE_MATCHED. * Change the behaviour of the matching methods so that the match_data fields are populated on all matches with "(rc >= 0 || rc==NO_MATCH || rc==PARTIAL)". We previously ensured that every call to a match method guarantees to set the rc field on the match_data. * Add modifiers to pcre2test to better exercise these pcre2_substitute conditions --------- Co-authored-by: Isaac Oscar Gariano <[email protected]>
1 parent 783b5f8 commit ed69a3a

24 files changed

+1358
-658
lines changed

doc/html/pcre2_substitute.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,9 @@ <h2>
9191
</p>
9292
<p>
9393
If PCRE2_SUBSTITUTE_MATCHED is set, <i>match_data</i> must be non-NULL; its
94-
contents must be the result of a call to <b>pcre2_match()</b> using the same
95-
pattern and subject.
94+
contents must be the result of a call to <b>pcre2_match()</b> (or
95+
<b>pcre2_jit_match()</b>) using the same pattern, subject pointer, effective
96+
subject length, start offset, and match options.
9697
</p>
9798
<p>
9899
The function returns the number of substitutions, which may be zero if there

doc/html/pcre2api.html

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3863,11 +3863,15 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
38633863
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
38643864
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
38653865
<i>match_data</i> block must be provided, and it must have already been used for
3866-
an external call to <b>pcre2_match()</b> with the same pattern and subject
3867-
arguments. The data in the <i>match_data</i> block (return code, offset vector)
3868-
is then used for the first substitution instead of calling <b>pcre2_match()</b>
3869-
from within <b>pcre2_substitute()</b>. This allows an application to check for a
3870-
match before choosing to substitute, without having to repeat the match.
3866+
an external call to <b>pcre2_match()</b> (or <b>pcre2_jit_match()</b>) with the
3867+
same pattern, subject, effective subject length, start offset, and match option
3868+
arguments (substitute-specific options can be added to the <i>options</i>
3869+
argument). If any of these parameters is changed, <b>pcre2_substitute()</b>
3870+
returns an error. The data in the <i>match_data</i> block (return code, offset
3871+
vector) is used for the first substitution instead of calling
3872+
<b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows an
3873+
application to check for a match before choosing to substitute, without having
3874+
to repeat the match.
38713875
</p>
38723876
<p>
38733877
The contents of the externally supplied match data block are not changed when
@@ -3883,6 +3887,18 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
38833887
UTF setting and the number of capturing parentheses in the pattern.
38843888
</p>
38853889
<p>
3890+
When using PCRE2_SUBSTITUTE_MATCHED, you should not modify the subject string
3891+
in between the prior call to <b>pcre2_match()</b> and <b>pcre2_substitute()</b>,
3892+
as the substitution assumes that the passed-in ovector is compatible with the
3893+
subject string. Although PCRE2 does verify that the subject is a pointer to the
3894+
same buffer, it cannot in general verify whether the contents of the buffer have
3895+
changed. For example, if the subject buffer is mutated from one valid UTF-8
3896+
string to another valid string, of the same length in code units, the ovector
3897+
offsets are no longer guaranteed to point to the start of a character. Beware
3898+
that with PCRE2_SUBSTITUTE_MATCHED in UTF mode, the subject string is not
3899+
re-scanned for UTF validity when <b>pcre2_substitute()</b> first uses it.
3900+
</p>
3901+
<p>
38863902
The default action of <b>pcre2_substitute()</b> is to return a copy of the
38873903
subject string with matched substrings replaced. However, if
38883904
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are

doc/html/pcre2test.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,7 +1194,8 @@ <h3>
11941194
heapframes_size show match data heapframes size
11951195
jitstack=&#60;n&#62; set size of JIT stack
11961196
mark show mark values
1197-
replace=&#60;string&#62; specify a replacement string
1197+
null_substitute_match_data substitute with NULL match data
1198+
replace=&#60;str&#62; specify a replacement string
11981199
startchar show starting character when relevant
11991200
substitute_callout use substitution callouts
12001201
substitute_case_callout use substitution case callouts
@@ -1369,11 +1370,12 @@ <h3>
13691370
null_context match with a NULL context
13701371
null_replacement substitute with NULL replacement
13711372
null_subject match with NULL subject
1373+
null_substitute_match_data substitute with NULL match data
13721374
offset=&#60;n&#62; set starting offset
13731375
offset_limit=&#60;n&#62; set offset limit
13741376
ovector=&#60;n&#62; set size of output vector
13751377
recursion_limit=&#60;n&#62; obsolete synonym for depth_limit
1376-
replace=&#60;string&#62; specify a replacement string
1378+
replace=&#60;str&#62; specify a replacement string
13771379
startchar show startchar when relevant
13781380
startoffset=&#60;n&#62; same as offset=&#60;n&#62;
13791381
substitute_callout use substitution callouts
@@ -1385,6 +1387,7 @@ <h3>
13851387
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
13861388
substitute_skip=&#60;n&#62; skip substitution number n
13871389
substitute_stop=&#60;n&#62; skip substitution number n and greater
1390+
substitute_subject=&#60;str&#62; specify a different subject for substitution
13881391
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
13891392
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
13901393
zero_terminate pass the subject as zero-terminated
@@ -1615,6 +1618,13 @@ <h3>
16151618
A replacement string is ignored with POSIX and DFA matching. Specifying partial
16161619
matching provokes an error return ("bad option value") from
16171620
<b>pcre2_substitute()</b>.
1621+
<br>
1622+
<br>
1623+
The <b>substitute_subject</b> modifier may be used to test the use of the PCRE2
1624+
API, in which a client calls <b>pcre2_match()</b> followed by <b>pcre2_substitute()</b>
1625+
with PCRE2_SUBSTITUTE_MATCHED, but the client performs an unexpected and
1626+
unsupported modification of the subject buffer in-place, in between the match
1627+
and substitution.
16181628
</p>
16191629
<h3>
16201630
Testing substitute callouts

0 commit comments

Comments
 (0)