Skip to content

Commit 1e5e035

Browse files
khwilliamsonbook
authored andcommitted
feature enhanced_re_xx
1 parent 14c9f41 commit 1e5e035

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed

ppcs/ppc0026-enhanced-regex-xx.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# RFC - enhanced regex /xx
2+
3+
## Preamble
4+
5+
Author: Karl Williamson <[email protected]>
6+
ID: KHW-0001
7+
Status: Draft
8+
9+
## Abstract
10+
11+
Let programmers improve the readability of regular expression patterns beyond
12+
what is possible now.
13+
14+
## Motivation
15+
16+
Regular expression patterns were designed for concision rather than clarity.
17+
18+
The /x regular expression pattern modifier was created to enable adding
19+
comments and white space to patterns to make them more readable. It suffers
20+
from not working for bracketed character classes, and silently compiling to
21+
something unintended when the programmer forgets to mark literal white space,
22+
and much worse, literal '#'. This last silently swallows the rest of the line
23+
that was supposed to be a part of the pattern.
24+
25+
I eventually added /xx to at least allow tabs and blanks inside bracketed
26+
character classes. This allows a very minor improvement in their readability.
27+
I could not figure out a way to extend this to allow comments and multiple
28+
lines inside such a class without making it even more likely that the pattern
29+
would silently compile to something unintended. But now, I think this RFC
30+
fixes that.
31+
32+
## Specification
33+
34+
I propose adding a new opt-in feature. Call it, for now, "feature
35+
enhanced_re_xx". Within its scope, the /xx modifier would change things so
36+
that inside a bracketed character class [...], any vertical space would be
37+
treated as a blank, essentially ignored. Any unescaped '#' would begin a
38+
comment that ends at the end of the line.
39+
40+
This would change the existing /x behavior where the portion of the line after
41+
the '#' is parsed, looking for a potential pattern terminating delimiter.
42+
Under this feature to terminate a pattern, do so before any '#' on a line.
43+
If an unescaped terminating delimiter is found after a '#' on a line, a warning
44+
would be raised.
45+
46+
And an unescaped '#' within a comment would raise a warning. So
47+
48+
$a[$i] =~ qr/ [ a-z # We need to match the lowercase alphabetics
49+
! @ # . * # And certain punctuation
50+
0-9 # And the digits (which can only occur in $a[0])
51+
]
52+
/xx;
53+
54+
would warn.
55+
56+
It might be that an unescaped '#' that isn't of the form \s+#\s+ should
57+
warn to catch things like if the above example's second line were just
58+
59+
!@#.*
60+
61+
Also, any comments inside [...] would check for an unescaped ']' on the same
62+
line after a '#', and raise a warning if found. So, something like
63+
64+
$a[$i] =~ qr/ [ a-z # . * ]
65+
[ A-Z ]
66+
/xx;
67+
68+
would warn. Either escape the '#' or the ']' to suppress it, depending on what
69+
your intent was.
70+
71+
I think these would catch essentially all unintended uses of '#' to mean
72+
not-a-comment, but to be taken literally.
73+
74+
I can't think of anything to catch blanks/tabs being unintentionally ignored.
75+
76+
I also propose that unescaped '#' and vertical space inside bracketed character
77+
classes under /xx be deprecated. /xx has been available only since 5.26;
78+
there's not a huge amount of code that uses it. After the deprecation cycle,
79+
the feature could become automatic, not opt-in, and /xx would have the new
80+
meaning.
81+
82+
Note there is no change to plain /x.
83+
84+
Copyright (C) 2022 Karl Williamson
85+
86+
This document and code and documentation within it may be used, redistributed
87+
and/or modified under the same terms as Perl itself.
88+

0 commit comments

Comments
 (0)