Skip to content

Commit cfce191

Browse files
authored
Extra details in PCRE2_SUBSTITUTE_MATCHED docs
1 parent 57cd1e6 commit cfce191

File tree

3 files changed

+60
-35
lines changed

3 files changed

+60
-35
lines changed

doc/html/pcre2api.html

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3854,7 +3854,8 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
38543854
<p>
38553855
If <i>match_data</i> is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
38563856
provided block is used for all calls to <b>pcre2_match()</b>, and its contents
3857-
afterwards are the result of the final call. For global changes, this will
3857+
afterwards are the result of the final call made internally by
3858+
<b>pcre2_substitute()</b> to the matching function. For global changes, this will
38583859
always be a no-match error. The contents of the ovector within the match data
38593860
block may or may not have been changed.
38603861
</p>
@@ -3864,8 +3865,8 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
38643865
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
38653866
<i>match_data</i> block must be provided, and it must have already been used for
38663867
an external call to <b>pcre2_match()</b> (or <b>pcre2_jit_match()</b>) with the
3867-
same pattern, subject, effective subject length, start offset, and match option
3868-
arguments (substitute-specific options can be added to the <i>options</i>
3868+
same pattern, subject pointer, effective subject length, start offset, and match
3869+
option arguments (substitute-specific options can be added to the <i>options</i>
38693870
argument). If any of these parameters is changed, <b>pcre2_substitute()</b>
38703871
returns an error. The data in the <i>match_data</i> block (return code, offset
38713872
vector) is used for the first substitution instead of calling
@@ -3874,11 +3875,19 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
38743875
to repeat the match.
38753876
</p>
38763877
<p>
3878+
If the contents of the subject buffer are mutated in between <b>pcre2_match()</b>
3879+
and a call to <b>pcre2_substitute()</b> with PCRE2_SUBSTITUTE_MATCHED, the
3880+
behaviour is unsafe; in particular, in this case, PCRE2 is unable to ensure that
3881+
the offsets in the ovector point to the start of characters (with UTF-encoded
3882+
input).
3883+
</p>
3884+
<p>
38773885
The contents of the externally supplied match data block are not changed when
3878-
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
3879-
<b>pcre2_match()</b> is called after the first substitution to check for further
3880-
matches, but this is done using an internally obtained match data block, thus
3881-
always leaving the external block unchanged.
3886+
PCRE2_SUBSTITUTE_MATCHED is set, and so the match block is permitted for use in
3887+
another call using PCRE2_SUBSTITUTE_MATCHED. If PCRE2_SUBSTITUTE_GLOBAL is also
3888+
set, <b>pcre2_match()</b> is called after the first substitution to check for
3889+
furthe matches, but this is done using an internally obtained match data block,
3890+
thus always leaving the external block unchanged.
38823891
</p>
38833892
<p>
38843893
The <i>code</i> argument is not used for matching before the first substitution

doc/pcre2.txt

Lines changed: 29 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3711,30 +3711,38 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
37113711

37123712
If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
37133713
provided block is used for all calls to pcre2_match(), and its contents
3714-
afterwards are the result of the final call. For global changes, this
3714+
afterwards are the result of the final call made internally by
3715+
pcre2_substitute() to the matching function. For global changes, this
37153716
will always be a no-match error. The contents of the ovector within the
37163717
match data block may or may not have been changed.
37173718

3718-
As well as the usual options for pcre2_match(), a number of additional
3719-
options can be set in the options argument of pcre2_substitute(). One
3720-
such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
3721-
match_data block must be provided, and it must have already been used
3722-
for an external call to pcre2_match() (or pcre2_jit_match()) with the
3723-
same pattern, subject, effective subject length, start offset, and
3724-
match option arguments (substitute-specific options can be added to the
3725-
options argument). If any of these parameters is changed, pcre2_substi-
3726-
tute() returns an error. The data in the match_data block (return code,
3727-
offset vector) is used for the first substitution instead of calling
3728-
pcre2_match() from within pcre2_substitute(). This allows an applica-
3729-
tion to check for a match before choosing to substitute, without having
3730-
to repeat the match.
3731-
3732-
The contents of the externally supplied match data block are not
3733-
changed when PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTI-
3734-
TUTE_GLOBAL is also set, pcre2_match() is called after the first sub-
3735-
stitution to check for further matches, but this is done using an in-
3736-
ternally obtained match data block, thus always leaving the external
3737-
block unchanged.
3719+
As well as the usual options for pcre2_match(), a number of additional
3720+
options can be set in the options argument of pcre2_substitute(). One
3721+
such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
3722+
match_data block must be provided, and it must have already been used
3723+
for an external call to pcre2_match() (or pcre2_jit_match()) with the
3724+
same pattern, subject pointer, effective subject length, start offset,
3725+
and match option arguments (substitute-specific options can be added to
3726+
the options argument). If any of these parameters is changed,
3727+
pcre2_substitute() returns an error. The data in the match_data block
3728+
(return code, offset vector) is used for the first substitution instead
3729+
of calling pcre2_match() from within pcre2_substitute(). This allows an
3730+
application to check for a match before choosing to substitute, without
3731+
having to repeat the match.
3732+
3733+
If the contents of the subject buffer are mutated in between
3734+
pcre2_match() and a call to pcre2_substitute() with PCRE2_SUBSTI-
3735+
TUTE_MATCHED, the behaviour is unsafe; in particular, in this case,
3736+
PCRE2 is unable to ensure that the offsets in the ovector point to the
3737+
start of characters (with UTF-encoded input).
3738+
3739+
The contents of the externally supplied match data block are not
3740+
changed when PCRE2_SUBSTITUTE_MATCHED is set, and so the match block is
3741+
permitted for use in another call using PCRE2_SUBSTITUTE_MATCHED. If
3742+
PCRE2_SUBSTITUTE_GLOBAL is also set, pcre2_match() is called after the
3743+
first substitution to check for furthe matches, but this is done using
3744+
an internally obtained match data block, thus always leaving the exter-
3745+
nal block unchanged.
37383746

37393747
The code argument is not used for matching before the first substitu-
37403748
tion when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided,

doc/pcre2api.3

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3861,7 +3861,8 @@ allocate memory for the compiled code.
38613861
.P
38623862
If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
38633863
provided block is used for all calls to \fBpcre2_match()\fP, and its contents
3864-
afterwards are the result of the final call. For global changes, this will
3864+
afterwards are the result of the final call made internally by
3865+
\fBpcre2_substitute()\fP to the matching function. For global changes, this will
38653866
always be a no-match error. The contents of the ovector within the match data
38663867
block may or may not have been changed.
38673868
.P
@@ -3870,20 +3871,27 @@ options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
38703871
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
38713872
\fImatch_data\fP block must be provided, and it must have already been used for
38723873
an external call to \fBpcre2_match()\fP (or \fBpcre2_jit_match()\fP) with the
3873-
same pattern, subject, effective subject length, start offset, and match option
3874-
arguments (substitute-specific options can be added to the \fIoptions\fP
3874+
same pattern, subject pointer, effective subject length, start offset, and match
3875+
option arguments (substitute-specific options can be added to the \fIoptions\fP
38753876
argument). If any of these parameters is changed, \fBpcre2_substitute()\fP
38763877
returns an error. The data in the \fImatch_data\fP block (return code, offset
38773878
vector) is used for the first substitution instead of calling
38783879
\fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows an
38793880
application to check for a match before choosing to substitute, without having
38803881
to repeat the match.
38813882
.P
3883+
If the contents of the subject buffer are mutated in between \fBpcre2_match()\fP
3884+
and a call to \fBpcre2_substitute()\fP with PCRE2_SUBSTITUTE_MATCHED, the
3885+
behaviour is unsafe; in particular, in this case, PCRE2 is unable to ensure that
3886+
the offsets in the ovector point to the start of characters (with UTF-encoded
3887+
input).
3888+
.P
38823889
The contents of the externally supplied match data block are not changed when
3883-
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
3884-
\fBpcre2_match()\fP is called after the first substitution to check for further
3885-
matches, but this is done using an internally obtained match data block, thus
3886-
always leaving the external block unchanged.
3890+
PCRE2_SUBSTITUTE_MATCHED is set, and so the match block is permitted for use in
3891+
another call using PCRE2_SUBSTITUTE_MATCHED. If PCRE2_SUBSTITUTE_GLOBAL is also
3892+
set, \fBpcre2_match()\fP is called after the first substitution to check for
3893+
furthe matches, but this is done using an internally obtained match data block,
3894+
thus always leaving the external block unchanged.
38873895
.P
38883896
The \fIcode\fP argument is not used for matching before the first substitution
38893897
when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when

0 commit comments

Comments
 (0)