Skip to content

Commit 2b290f2

Browse files
committed
pod and comments: Note escape vs quote
Fixes #15221 The documentation and comments were misleading about conflating quoting a metacharacter and escaping it. Since \Q stands for quote, we have to continue to use that terminology. This commit clarifies that the two terms are often equivalent. This also adds detail about quotemeta and \Q.
1 parent 06cf97e commit 2b290f2

File tree

7 files changed

+70
-29
lines changed

7 files changed

+70
-29
lines changed

pod/perldiag.pod

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2602,8 +2602,8 @@ and perl's F</dev/null> emulation was unable to create an empty temporary file.
26022602
(W regexp)(F) A character class range must start and end at a literal
26032603
character, not another character class like C<\d> or C<[:alpha:]>. The "-"
26042604
in your false range is interpreted as a literal "-". In a C<(?[...])>
2605-
construct, this is an error, rather than a warning. Consider quoting
2606-
the "-", "\-". The S<<-- HERE> shows whereabouts in the regular expression
2605+
construct, this is an error, rather than a warning. Consider escaping
2606+
the "-" as "\-". The S<<-- HERE> shows whereabouts in the regular expression
26072607
the problem was discovered. See L<perlre>.
26082608

26092609
=item Fatal VMS error (status=%d) at %s, line %d
@@ -5453,7 +5453,7 @@ S<<-- HERE> in m/%s/
54535453
(F) Within regular expression character classes ([]) the syntax beginning
54545454
with "[." and ending with ".]" is reserved for future extensions. If you
54555455
need to represent those character sequences inside a regular expression
5456-
character class, just quote the square brackets with the backslash: "\[."
5456+
character class, just escape the square brackets with the backslash: "\[."
54575457
and ".\]". The S<<-- HERE> shows whereabouts in the regular expression the
54585458
problem was discovered. See L<perlre>.
54595459

@@ -5463,7 +5463,7 @@ S<<-- HERE> in m/%s/
54635463
(F) Within regular expression character classes ([]) the syntax beginning
54645464
with "[=" and ending with "=]" is reserved for future extensions. If you
54655465
need to represent those character sequences inside a regular expression
5466-
character class, just quote the square brackets with the backslash: "\[="
5466+
character class, just escape the square brackets with the backslash: "\[="
54675467
and "=\]". The S<<-- HERE> shows whereabouts in the regular expression the
54685468
problem was discovered. See L<perlre>.
54695469

pod/perlfunc.pod

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6536,6 +6536,11 @@ the C<\Q> escape in double-quoted strings.
65366536

65376537
If EXPR is omitted, uses L<C<$_>|perlvar/$_>.
65386538

6539+
The motivation behind this is to make all characters in EXPR match their
6540+
literal selves. Otherwise any metacharacters in it could trigger
6541+
their "magic" matching behaviors. The characters this function has been
6542+
applied to are said to be "quoted" or "escaped".
6543+
65396544
quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into
65406545
regular expressions, because by default an interpolated variable will be
65416546
considered a mini-regular expression. For example:

pod/perlre.pod

Lines changed: 50 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1350,24 +1350,60 @@ X</p> X<p modifier>
13501350

13511351
=head2 Quoting metacharacters
13521352

1353-
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
1354-
C<\w>, C<\n>. Unlike some other regular expression languages, there
1355-
are no backslashed symbols that aren't alphanumeric. So anything
1356-
that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
1357-
always
1358-
interpreted as a literal character, not a metacharacter. This was
1359-
once used in a common idiom to disable or quote the special meanings
1360-
of regular expression metacharacters in a string that you want to
1361-
use for a pattern. Simply quote all non-"word" characters:
1353+
(Also known as "escaping".)
1354+
1355+
To cause a metacharacter to match its literal self, you precede it with
1356+
a backslash. Unlike some other regular expression languages, any
1357+
sequence consisting of a backslash followed by a non-alphanumeric
1358+
matches that non-alphanumeric, literally. So things like C<\\>, C<\(>,
1359+
C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are always interpreted as the
1360+
literal character that follows the backslash.
1361+
1362+
(That's not true when an alphanumeric character is preceded by a
1363+
backslash. There are a few such "escape sequences", like C<\w>, which have
1364+
special matching behaviors in Perl. All such are currently limited to
1365+
ASCII-range alphanumerics.)
1366+
1367+
But a non-alphanumeric will always match literally when preceded by a
1368+
backslash. Hence simply adding backslashes before all non-"word"
1369+
characters can be used to disable the special meanings of regular
1370+
expression metacharacters in a string that you want to use for a
1371+
pattern.
13621372

13631373
$pattern =~ s/(\W)/\\$1/g;
13641374

1365-
(If C<use locale> is set, then this depends on the current locale.)
1366-
Today it is more common to use the C<L<quotemeta()|perlfunc/quotemeta>>
1367-
function or the C<\Q> metaquoting escape sequence to disable all
1368-
metacharacters' special meanings like this:
1375+
then
13691376

1370-
/$unquoted\Q$quoted\E$unquoted/
1377+
$string =~ s/$pattern/foo/;
1378+
1379+
(If C<use locale> is in effect, the current locale can affect the
1380+
results.)
1381+
1382+
This template used to be a common paradigm, but these days it is more
1383+
usual to use the C<L<quotemeta()|perlfunc/quotemeta>>
1384+
function or especially the C<\Q> metaquoting escape sequence to disable all
1385+
metacharacters' special meanings.
1386+
1387+
In fact, C<quotemeta> effectively does the same thing as the old
1388+
paradigm, but without any locale dependence. That is,
1389+
1390+
quotemeta $pattern;
1391+
1392+
does either of these:
1393+
1394+
$pattern =~ s/(\W)/\\$1/ug;
1395+
$pattern =~ s/(\W)/\\$1/ag;
1396+
1397+
The first statement applies if the
1398+
L<< C<unicode_strings>|feature/"The 'unicode_strings' feature" >> is
1399+
enabled; the second if that feature is disabled.
1400+
1401+
S<C<\Q>...C<\E>> acts identically, but you don't have to have a separate
1402+
scalar to hold C<$pattern>, and it has the significant added flexibility
1403+
to allow you to selectively apply it to any portions of the matched
1404+
string you choose; like this:
1405+
1406+
$string =~ s/$unquoted\Q$quoted\E$unquoted/foo/;
13711407

13721408
Beware that if you put literal backslashes (those not inside
13731409
interpolated variables) between C<\Q> and C<\E>, double-quotish

pod/perlrebackslash.pod

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ as C<Not in [].>
9090
\o{} Octal escape sequence.
9191
\p{}, \pP Match any character with the given Unicode property.
9292
\P{}, \PP Match any character without the given property.
93-
\Q Quote (disable) pattern metacharacters till \E. Not
94-
in [].
93+
\Q Quote (disable) pattern metacharacters till \E.
94+
(Also called "escape".) Not in [].
9595
\r Return character.
9696
\R Generic new line. Not in [].
9797
\s Match any whitespace character.
@@ -350,11 +350,11 @@ them, until either the end of the pattern or the next occurrence of
350350
C<\E>, whichever comes first. They provide functionality similar to what
351351
the functions C<lc> and C<uc> provide.
352352

353-
C<\Q> is used to quote (disable) pattern metacharacters, up to the next
354-
C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
355-
that could have special meaning to Perl. In the ASCII range, it quotes
356-
every character that isn't a letter, digit, or underscore. See
357-
L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
353+
C<\Q> is used to quote or escape (disable) pattern metacharacters, up to
354+
the next C<\E> or the end of the pattern. C<\Q> adds a backslash to any
355+
character that could have special meaning to Perl. In the ASCII range,
356+
it quotes every character that isn't a letter, digit, or underscore.
357+
See L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
358358
code points. Using this ensures that any character between C<\Q> and
359359
C<\E> will be matched literally, not interpreted as a metacharacter by
360360
the regex engine.

pod/perlreref.pod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ Captured groups are numbered according to their I<opening> paren.
318318
fc Foldcase a string
319319

320320
pos Return or set current match position
321-
quotemeta Quote metacharacters
321+
quotemeta Quote metacharacters (escape their normal meaning)
322322
reset Reset m?pattern? status
323323
study Analyze string for optimizing matching
324324

pod/perlretut.pod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ C<"["> respectively; other gotchas apply.
187187
The significance of each of these will be explained
188188
in the rest of the tutorial, but for now, it is important only to know
189189
that a metacharacter can be matched as-is by putting a backslash before
190-
it:
190+
it. This is called "escaping" or "quoting" it. Some examples:
191191

192192
"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
193193
"2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +

pp.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5082,7 +5082,7 @@ PP(pp_quotemeta)
50825082
else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(s, s + len)) {
50835083
if (
50845084
#ifdef USE_LOCALE_CTYPE
5085-
/* In locale, we quote all non-ASCII Latin1 chars.
5085+
/* In locale, we escape all non-ASCII Latin1 chars.
50865086
* Otherwise use the quoting rules */
50875087

50885088
IN_LC_RUNTIME(LC_CTYPE)
@@ -5116,7 +5116,7 @@ PP(pp_quotemeta)
51165116
}
51175117
}
51185118
else {
5119-
/* For non UNI_8_BIT (and hence in locale) just quote all \W
5119+
/* For non UNI_8_BIT (and hence in locale) just escape all \W
51205120
* including everything above ASCII */
51215121
while (len--) {
51225122
if (!isWORDCHAR_A(*s))

0 commit comments

Comments
 (0)