Skip to content

Commit df43b7e

Browse files
Merge pull request #869 from ossf/regex_standards_status
Add status of adding \A and \z to all platforms
2 parents 54f7c32 + be93fae commit df43b7e

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

docs/Correctly-Using-Regular-Expressions-Rationale.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -477,6 +477,39 @@ Many developers believe that regex notation is the same everywhere, even though
477477

478478
Such changes would take years to adopt. Even worse, these changes might not be accepted in some cases because some people may think that merely being possible to do something is adequate. We don’t agree; we think it’s important to make it _easy_ to do the secure action, not just possible, and it’s best to make avoidable mistakes les likely. These changes require implementations in many systems and modifications of many specifications; doing this has been historically challenging. Still, such changes would reduce the likelihood of these problems worldwide.
479479

480+
#### Status of adding \A and \z across ecosystems
481+
482+
As previously noted, one start is to have a _single_ regex syntax
483+
that _always_ means "match beginning of input and "match end of input"
484+
_even_ when a multi-line mode is enabled.
485+
This notation is especially important for security, because they make it
486+
practical to use regexes for input validation.
487+
488+
Many platforms already support \A and \z respectively for beginning-of-input
489+
and end-of-input.
490+
These platforms are
491+
Perl, .NET/C#, Java, PHP, PCRE, Golang, Rust crate regex, RE2, and Ruby.
492+
493+
If the following platforms made adjustments, the notation would
494+
be nearly universal:
495+
496+
* POSIX: On 2024-04-24 the Austin Group "accepted as marked"
497+
[bug 1919](https://www.austingroupbugs.net/view.php?id=1919) the
498+
proposed change to add \A and \z to extended regular expressions (EREs).
499+
They decided to not require the older BRE syntax to support it at this time,
500+
but do plan to add a NOTE saying "a future version of this standard is likely
501+
to require such characters to be supported in a regular expression.
502+
Implementors are encouraged to provide this as an extension using
503+
"\A" for the beginning and "\z" for the end of strings as they are
504+
already in widespread use for this purpose in other languages."
505+
* ECMAScript/JavaScript: In 2021 Ron Buckton (@rbuckton) created the proposal
506+
[Regular Expression Buffer Boundaries for ECMAScript](https://github.com/tc39/proposal-regexp-buffer-boundaries)
507+
to add \A and \z to ECMAScript/JavaScript, and it advanced to stage 2,
508+
but it seems to be stuck there. We intend to see if we can help it advance.
509+
* Python: Python supports \A, but it uses the unique \Z instead of the
510+
\z used everywhere else.
511+
We'll ask to see if it can be advanced.
512+
480513
## Authors and contributors
481514

482515
We would like to thank the following contributors:

0 commit comments

Comments
 (0)