Skip to content

Commit a8ab3be

Browse files
authored
Improve support for partial matching in pcre2_substitute (#858)
In 10.46 and previously, this was partially possible using PCRE2_SUBSTITUTE_MATCHED, but that capability was removed in 10.47. This extends and restores a fixed version of the behaviour.
1 parent 88a083c commit a8ab3be

File tree

9 files changed

+385
-253
lines changed

9 files changed

+385
-253
lines changed

doc/html/pcre2api.html

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3925,6 +3925,15 @@ <h2><a name="SEC38" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a></h
39253925
can be used to separate them if necessary.
39263926
</p>
39273927
<p>
3928+
Partial matching is supported, with limitations: if matching succeeds but with a
3929+
partial match, then pcre2_substitute returns PCRE2_ERROR_PARTIAL. When
3930+
partial-matching (either of PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT is passed),
3931+
then PCRE2_SUBSTITUTE_REPLACEMENT_ONLY must also be set, or else
3932+
PCRE2_ERROR_BADOPTION is returned. Similarly, certain replacement items
3933+
($' and $_) cause PCRE2_ERROR_PARTIALSUBS to be returned when partial-matching,
3934+
even if a complete match is found.
3935+
</p>
3936+
<p>
39283937
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
39293938
variable that contains the length, in code units, of the output buffer. If the
39303939
function is successful, the value is updated to contain the length in code

doc/pcre2.txt

Lines changed: 255 additions & 247 deletions
Large diffs are not rendered by default.

doc/pcre2api.3

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3928,6 +3928,14 @@ below)
39283928
.\"
39293929
can be used to separate them if necessary.
39303930
.P
3931+
Partial matching is supported, with limitations: if matching succeeds but with a
3932+
partial match, then pcre2_substitute returns PCRE2_ERROR_PARTIAL. When
3933+
partial-matching (either of PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT is passed),
3934+
then PCRE2_SUBSTITUTE_REPLACEMENT_ONLY must also be set, or else
3935+
PCRE2_ERROR_BADOPTION is returned. Similarly, certain replacement items
3936+
($' and $_) cause PCRE2_ERROR_PARTIALSUBS to be returned when partial-matching,
3937+
even if a complete match is found.
3938+
.P
39313939
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
39323940
variable that contains the length, in code units, of the output buffer. If the
39333941
function is successful, the value is updated to contain the length in code

src/pcre2.h.generic

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,7 @@ released, the numbers must not be changed. */
438438
#define PCRE2_ERROR_DIFFSUBSOFFSET (-73)
439439
#define PCRE2_ERROR_DIFFSUBSOPTIONS (-74)
440440
#define PCRE2_ERROR_BAD_BACKSLASH_K (-75)
441+
#define PCRE2_ERROR_PARTIALSUBS (-76)
441442

442443

443444
/* Request types for pcre2_pattern_info() */

src/pcre2.h.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,7 @@ released, the numbers must not be changed. */
438438
#define PCRE2_ERROR_DIFFSUBSOFFSET (-73)
439439
#define PCRE2_ERROR_DIFFSUBSOPTIONS (-74)
440440
#define PCRE2_ERROR_BAD_BACKSLASH_K (-75)
441+
#define PCRE2_ERROR_PARTIALSUBS (-76)
441442

442443

443444
/* Request types for pcre2_pattern_info() */

src/pcre2_error.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,9 @@ static const unsigned char match_error_texts[] =
305305
"substitute subject differs from prior match call\0"
306306
"substitute start offset differs from prior match call\0"
307307
"substitute options differ from prior match call\0"
308+
/* 75 */
308309
"disallowed use of \\K in lookaround\0"
310+
"replacement $' or $_ not supported with partial match\0"
309311
;
310312

311313

src/pcre2_substitute.c

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -756,6 +756,7 @@ BOOL overflowed = FALSE;
756756
BOOL use_existing_match;
757757
BOOL replacement_only;
758758
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
759+
BOOL partial = (options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0;
759760
PCRE2_UCHAR temp[6];
760761
PCRE2_UCHAR null_str[1] = { 0xcd };
761762
PCRE2_SPTR original_subject = subject;
@@ -783,10 +784,16 @@ if (mcontext != NULL)
783784
substitute_case_callout_data = mcontext->substitute_case_callout_data;
784785
}
785786

786-
/* Partial matching is not valid. This must come after setting *blength to
787-
PCRE2_UNSET, so as not to imply an offset in the replacement. */
787+
/* Partial matching is supported, with limitations. We allow matching in partial
788+
mode, however, if a partial match is found, the substitution will fail with a
789+
PCRE2_ERROR_PARTIAL error. Additionally, outputting the after-match text is not
790+
allowed (PCRE2_ERROR_BADOPTION), and certain replacement items such as $' and $_
791+
are not supported (PCRE2_ERROR_PARTIALSUBS).
788792
789-
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
793+
This must come after setting *blength to PCRE2_UNSET, so as not to imply an
794+
offset in the replacement. */
795+
796+
if (partial && (options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) == 0)
790797
return PCRE2_ERROR_BADOPTION;
791798

792799
/* Validate length and find the end of the replacement. A NULL replacement of
@@ -1125,8 +1132,16 @@ for (;;)
11251132
if (next == CHAR_GRAVE_ACCENT || next == CHAR_APOSTROPHE)
11261133
{
11271134
++ptr;
1135+
1136+
/* (Sanity-check ovector before reading from it.) */
11281137
rc = pcre2_substring_length_bynumber(match_data, 0, &sublength);
1129-
if (rc < 0) goto PTREXIT; /* (Sanity-check ovector before reading from it.) */
1138+
/* LCOV_EXCL_START */
1139+
if (rc < 0)
1140+
{
1141+
PCRE2_DEBUG_UNREACHABLE();
1142+
goto PTREXIT;
1143+
}
1144+
/* LCOV_EXCL_STOP */
11301145

11311146
if (next == CHAR_GRAVE_ACCENT)
11321147
{
@@ -1135,6 +1150,12 @@ for (;;)
11351150
}
11361151
else
11371152
{
1153+
if (partial)
1154+
{
1155+
rc = PCRE2_ERROR_PARTIALSUBS;
1156+
goto PTREXIT;
1157+
}
1158+
11381159
subptr = subject + ovector[1];
11391160
subptrend = subject + length;
11401161
}
@@ -1145,12 +1166,19 @@ for (;;)
11451166
{
11461167
/* Java, .NET support $_ for "entire input string". */
11471168
++ptr;
1169+
1170+
if (partial)
1171+
{
1172+
rc = PCRE2_ERROR_PARTIALSUBS;
1173+
goto PTREXIT;
1174+
}
1175+
11481176
subptr = subject;
11491177
subptrend = subject + length;
11501178
goto SUBPTR_SUBSTITUTE;
11511179
}
1152-
else if (next == CHAR_PLUS &&
1153-
!(ptr+1 < repend && ptr[1] == CHAR_LEFT_CURLY_BRACKET))
1180+
if (next == CHAR_PLUS &&
1181+
!(ptr+1 < repend && ptr[1] == CHAR_LEFT_CURLY_BRACKET))
11541182
{
11551183
/* Perl supports $+ for "highest captured group" (not the same as $^N
11561184
which is mainly only useful inside Perl's match callbacks). We also

testdata/testinput2

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8322,4 +8322,33 @@ a)"xI
83228322
foo
83238323
fooBAR
83248324

8325+
# --------------
8326+
# Tests for substitution with partial
8327+
# --------------
8328+
8329+
/(a)b+/
8330+
\= Expect to fail with "bad option"
8331+
ab\=ph,replace=FOO
8332+
\= Expect to fail with "partial match"
8333+
ab\=ph,substitute_replacement_only,replace=FOO
8334+
\= Expect success
8335+
abc\=ph,substitute_replacement_only,replace=FOO
8336+
zabc\=ph,substitute_replacement_only,replace=>$&|$1|$`<
8337+
\= Expect to fail with PCRE2_ERROR_PARTIALSUBS
8338+
abc\=ph,substitute_replacement_only,replace=>$_<
8339+
abc\=ph,substitute_replacement_only,replace=>$'<
8340+
8341+
/(a)b+/
8342+
\= Expect to fail with "bad option"
8343+
ab\=ps,replace=FOO
8344+
\= Expect to fail with "partial match"
8345+
a\=ps,substitute_replacement_only,replace=FOO
8346+
\= Expect success
8347+
ab\=ps,substitute_replacement_only,replace=FOO
8348+
abc\=ps,substitute_replacement_only,replace=FOO
8349+
zabc\=ps,substitute_replacement_only,replace=>$&|$1|$`<
8350+
\= Expect to fail with PCRE2_ERROR_PARTIALSUBS
8351+
abc\=ps,substitute_replacement_only,replace=>$_<
8352+
abc\=ps,substitute_replacement_only,replace=>$'<
8353+
83258354
# End of testinput2

testdata/testoutput2

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23666,6 +23666,52 @@ Failed: error -72: substitute subject differs from prior match call
2366623666
fooBAR
2366723667
1: X::textY
2366823668

23669+
# --------------
23670+
# Tests for substitution with partial
23671+
# --------------
23672+
23673+
/(a)b+/
23674+
\= Expect to fail with "bad option"
23675+
ab\=ph,replace=FOO
23676+
Failed: error -34: bad option value
23677+
\= Expect to fail with "partial match"
23678+
ab\=ph,substitute_replacement_only,replace=FOO
23679+
Failed: error -2: partial match
23680+
\= Expect success
23681+
abc\=ph,substitute_replacement_only,replace=FOO
23682+
1: FOO
23683+
zabc\=ph,substitute_replacement_only,replace=>$&|$1|$`<
23684+
1: >ab|a|z<
23685+
\= Expect to fail with PCRE2_ERROR_PARTIALSUBS
23686+
abc\=ph,substitute_replacement_only,replace=>$_<
23687+
Failed: error -76 at offset 3 in replacement: replacement $' or $_ not supported with partial match
23688+
here: >$_ |<--| <
23689+
abc\=ph,substitute_replacement_only,replace=>$'<
23690+
Failed: error -76 at offset 3 in replacement: replacement $' or $_ not supported with partial match
23691+
here: >$' |<--| <
23692+
23693+
/(a)b+/
23694+
\= Expect to fail with "bad option"
23695+
ab\=ps,replace=FOO
23696+
Failed: error -34: bad option value
23697+
\= Expect to fail with "partial match"
23698+
a\=ps,substitute_replacement_only,replace=FOO
23699+
Failed: error -2: partial match
23700+
\= Expect success
23701+
ab\=ps,substitute_replacement_only,replace=FOO
23702+
1: FOO
23703+
abc\=ps,substitute_replacement_only,replace=FOO
23704+
1: FOO
23705+
zabc\=ps,substitute_replacement_only,replace=>$&|$1|$`<
23706+
1: >ab|a|z<
23707+
\= Expect to fail with PCRE2_ERROR_PARTIALSUBS
23708+
abc\=ps,substitute_replacement_only,replace=>$_<
23709+
Failed: error -76 at offset 3 in replacement: replacement $' or $_ not supported with partial match
23710+
here: >$_ |<--| <
23711+
abc\=ps,substitute_replacement_only,replace=>$'<
23712+
Failed: error -76 at offset 3 in replacement: replacement $' or $_ not supported with partial match
23713+
here: >$' |<--| <
23714+
2366923715
# End of testinput2
2367023716
Error -80: PCRE2_ERROR_BADDATA (unknown error number)
2367123717
Error -62: bad serialized data

0 commit comments

Comments
 (0)