Skip to content

Commit 93cc5ca

Browse files
Update RegexSyntax.md
Co-authored-by: Michael Ilseman <[email protected]>
1 parent 1beb1f0 commit 93cc5ca

File tree

1 file changed

+23
-11
lines changed

1 file changed

+23
-11
lines changed

Documentation/Evolution/RegexSyntax.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,38 @@
1-
# Regular Expression Syntax
1+
<!--
2+
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter delibration continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal.
3+
-->
4+
5+
# Regex Literal Interior Syntax
26

37
- Authors: Hamish Knight, Michael Ilseman
48

59
## Introduction
610

7-
We aim to parse a superset of the syntax accepted by a variety of popular regular expression engines.
11+
Regex literals declare a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Formalizing regex literals in Swift requires choosing a delimiter strategy (e.g. `#/.../#` or `re'...'`), detailing the syntax accepted in between the delimiters ("interior syntax"), and specifying actual types and any relevant protocols for the literal itself.
12+
13+
This proposal-component focuses on the interior syntax, which is large enough for its own targeted discussion ahead of the full proposal. Regex literal interior syntax will be part of Swift's source-compatibility story (and to some extent binary compatibility), so we present a detailed and comprehensive design.
14+
15+
## Motivation
816

9-
**TODO(Michael): Elaborate**
17+
Swift aims to be a pragmatic programming language, balancing (TODO: prose). Rather than pursue a novel interior syntax, (TODO: prose).
1018

11-
## Engines supported
19+
Regex interior syntax is part of a larger [proposal](https://forums.swift.org/t/pitch-regular-expression-literals/52820), which in turn is part of a larger [string processing effort](https://forums.swift.org/t/declarative-string-processing-overview/52459).
1220

13-
We aim to implement a syntactic superset of:
21+
## Proposed Solution
1422

15-
- [PCRE 2][pcre2-syntax], an "industry standard" of sorts, and a rough superset of Perl, Python, etc.
16-
- [Oniguruma][oniguruma-syntax], an internationalization-oriented engine with some modern features
23+
We propose accepting a syntactic "superset" of the following existing regular expression engines:
24+
25+
- [PCRE 2][pcre2-syntax], an "industry standard" and a rough superset of Perl, Python, etc.
26+
- [Oniguruma][oniguruma-syntax], a modern engine with additional features.
1727
- [ICU][icu-syntax], used by NSRegularExpression, a Unicode-focused engine.
18-
- [.NET][.net-syntax]'s regular expressions, which support delimiter-balancing and some interesting minor details on conditional patterns.
19-
- **TODO: List Java here? It doesn't really add any more syntax than the above other than `\p{javaLowerCase}`**
28+
- [.NET][.net-syntax], which adds delimiter-balancing and some interesting minor details around conditional patterns.
29+
30+
To our knowledge, all other popular regex engines support a subset of the above syntaxes.
31+
32+
We also support [UTS#18][uts18]'s full set of character class operators (to our knowledge no other engine does). Beyond that, UTS#18 deals with semantics rather than syntax, and what syntax it uses is covered by the above list. We also parse `\p{javaLowerCase}`, meaning we support a superset of Java 8 as well.
2033

21-
We also intend to achieve at least Level 1 (**TODO: do we want to promise Level 2?**) [UTS#18][uts18] conformance, which specifies regular expression matching semantics without mandating any particular syntax. However we can infer syntactic feature sets from its guidance.
34+
Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below
2235

23-
**TODO(Michael): Rework and expand prose**
2436

2537
## Detailed Design
2638

0 commit comments

Comments
 (0)