You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/RegexSyntax.md
+25-19Lines changed: 25 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ We propose accepting a syntactic "superset" of the following existing regular ex
38
38
39
39
To our knowledge, all other popular regex engines support a subset of the above syntaxes.
40
40
41
-
We also support [UTS#18][uts18]'s full set of character class operators (to our knowledge no other engine does). Beyond that, UTS#18 deals with semantics rather than syntax, and what syntax it uses is covered by the above list. We also parse `\p{javaLowerCase}`, meaning we support a superset of Java 8 as well.
41
+
We also support [UTS#18][uts18]'s full set of character class operators (to our knowledge no other engine does). Beyond that, UTS#18 deals with semantics rather than syntax, and what syntax it uses is covered by the above list. We also parse Java's properties (e.g. `\p{javaLowerCase}`), meaning we support a superset of Java 8 as well.
42
42
43
43
Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below
44
44
@@ -48,19 +48,25 @@ Regex literal interior syntax will be part of Swift's source-compatibility story
48
48
49
49
We propose the following syntax for use inside Swift regex literals.
50
50
51
-
<details><summary>Grammar Conventions</summary>
51
+
<details><summary>Grammar Notation</summary>
52
52
53
-
Elements of the grammar are defined using the syntax `Element -> <Definition>`.
53
+
For the grammar sections, we use a modified PEG-like notation, in which the grammar also describes an unambiguous top-down parsing algorithm.
54
54
55
-
Quoted characters e.g `'abc'`, `"abc"` in the grammar match against the literal characters. Unquoted names e.g `Concatenation` refer to other definitions in the grammar.
55
+
-`<Element> -> <Definition>` gives the definition of `Element`
56
+
- The `|` operator specifies a choice of alternatives
57
+
-`'x'` is the literal character `x`, otherwise it's a reference to x
58
+
+ A literal `'` is spelled `"'"`
59
+
- Postfix `*``+` and `?` denote zero-or-more, one-or-more, and zero-or-one
60
+
- Range quantifiers, like `{1...4}`, use Swift range syntax as convention.
61
+
- Basic custom character classes are written like `[0-9a-zA-Z]`
62
+
- Prefix `!` operator means the next element must not appear (a zero-width assertion)
63
+
- Parenthesis group for the purposes of quantification
64
+
- Builtins use angle brackets:
65
+
-`<Int>` refers to an integer, `<Char>` a character, etc.
66
+
-`<Space>` is any whitespace character
67
+
-`<EOL>` is the end-of-line anchor (e.g. `$` in regex).
56
68
57
-
The `|` operator is used to specify that the grammar can match against either branch of the operator, similar to a regular expression. Similarly, `*`, `+`, and `?` are used to quantify an element of the grammar, with the same meaning as in regular expressions. Range quantifiers `{...}` may also be used, though we adopt a more explicit syntax that uses the Swift `..<` & `...` operators, e.g `{1...4}`.
58
-
59
-
Basic custom character classes may appear in the grammar, and have the same meaning as in a regular expression. For example `[0-9a-zA-Z]` expresses the digits `0` to `9` and the letters `a` to `z` (both upper and lowercase).
60
-
61
-
The `!` prefix operator is used to specify that the following grammar element must not appear at that position.
62
-
63
-
Grammar elements may be surrounded by parentheses for the purposes of quantification.
69
+
For example, `(!'|' !')' ConcatComponent)*` means any number (zero or more) occurrences of `ConcatComponent` so long as the initial character is neither a literal `|` nor a literal `)`.
0 commit comments