|
1 |
| -# Regular Expression Syntax |
| 1 | +<!-- |
| 2 | +Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter delibration continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. |
| 3 | +--> |
| 4 | + |
| 5 | +# Regex Literal Interior Syntax |
2 | 6 |
|
3 | 7 | - Authors: Hamish Knight, Michael Ilseman
|
4 | 8 |
|
5 | 9 | ## Introduction
|
6 | 10 |
|
7 |
| -We aim to parse a superset of the syntax accepted by a variety of popular regular expression engines. |
| 11 | +Regex literals declare a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Formalizing regex literals in Swift requires choosing a delimiter strategy (e.g. `#/.../#` or `re'...'`), detailing the syntax accepted in between the delimiters ("interior syntax"), and specifying actual types and any relevant protocols for the literal itself. |
| 12 | + |
| 13 | +This proposal-component focuses on the interior syntax, which is large enough for its own targeted discussion ahead of the full proposal. Regex literal interior syntax will be part of Swift's source-compatibility story (and to some extent binary compatibility), so we present a detailed and comprehensive design. |
| 14 | + |
| 15 | +## Motivation |
8 | 16 |
|
9 |
| -**TODO(Michael): Elaborate** |
| 17 | +Swift aims to be a pragmatic programming language, balancing (TODO: prose). Rather than pursue a novel interior syntax, (TODO: prose). |
10 | 18 |
|
11 |
| -## Engines supported |
| 19 | +Regex interior syntax is part of a larger [proposal](https://forums.swift.org/t/pitch-regular-expression-literals/52820), which in turn is part of a larger [string processing effort](https://forums.swift.org/t/declarative-string-processing-overview/52459). |
12 | 20 |
|
13 |
| -We aim to implement a syntactic superset of: |
| 21 | +## Proposed Solution |
14 | 22 |
|
15 |
| -- [PCRE 2][pcre2-syntax], an "industry standard" of sorts, and a rough superset of Perl, Python, etc. |
16 |
| -- [Oniguruma][oniguruma-syntax], an internationalization-oriented engine with some modern features |
| 23 | +We propose accepting a syntactic "superset" of the following existing regular expression engines: |
| 24 | + |
| 25 | +- [PCRE 2][pcre2-syntax], an "industry standard" and a rough superset of Perl, Python, etc. |
| 26 | +- [Oniguruma][oniguruma-syntax], a modern engine with additional features. |
17 | 27 | - [ICU][icu-syntax], used by NSRegularExpression, a Unicode-focused engine.
|
18 |
| -- [.NET][.net-syntax]'s regular expressions, which support delimiter-balancing and some interesting minor details on conditional patterns. |
19 |
| -- **TODO: List Java here? It doesn't really add any more syntax than the above other than `\p{javaLowerCase}`** |
| 28 | +- [.NET][.net-syntax], which adds delimiter-balancing and some interesting minor details around conditional patterns. |
| 29 | + |
| 30 | +To our knowledge, all other popular regex engines support a subset of the above syntaxes. |
| 31 | + |
| 32 | +We also support [UTS#18][uts18]'s full set of character class operators (to our knowledge no other engine does). Beyond that, UTS#18 deals with semantics rather than syntax, and what syntax it uses is covered by the above list. We also parse `\p{javaLowerCase}`, meaning we support a superset of Java 8 as well. |
20 | 33 |
|
21 |
| -We also intend to achieve at least Level 1 (**TODO: do we want to promise Level 2?**) [UTS#18][uts18] conformance, which specifies regular expression matching semantics without mandating any particular syntax. However we can infer syntactic feature sets from its guidance. |
| 34 | +Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below |
22 | 35 |
|
23 |
| -**TODO(Michael): Rework and expand prose** |
24 | 36 |
|
25 | 37 | ## Detailed Design
|
26 | 38 |
|
|
0 commit comments