You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/DelimiterSyntax.md
+38-12Lines changed: 38 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ The use of a two letter prefix allows for easy future extensibility of such lite
25
25
26
26
There are a few items of regex grammar that use the single quote character as a metacharacter. These include named group definitions and references such as `(?'name')`, `(?('name'))`, `\g'name'`, `\k'name'`, as well as callout syntax `(?C'arg')`. The use of a single quote conflicts with the `re'...'` delimiter as it will be considered the end of the literal. Fortunately, alternative syntax exists for all of these constructs, e.g `(?<name>)`, `\k<name>`, and `(?C"arg")`.
27
27
28
-
As such, the single quote variants of the syntax will be considered invalid in a `re'...'` literal, and users must use the alternative syntax. If a raw variant of the syntax `re#'...'#` of the syntax is later added, that may also be used. In order to improve diagnostic behavior, the compiler will attempt to scan ahead when encountering the ending sequences `(?`, `(?(`, `\g`, `\k` and `(?C`. This will enable a more accurate error to be emitted that suggests the alternative syntax.
28
+
As such, the single quote variants of the syntax will be considered invalid in a `re'...'` literal, and users must use the alternative syntax instead. If a raw variant of the syntax `re#'...'#` of the syntax is later added, that may also be used. In order to improve diagnostic behavior, the compiler will attempt to scan ahead when encountering the ending sequences `(?`, `(?(`, `\g`, `\k` and `(?C`. This will enable a more accurate error to be emitted that suggests the alternative syntax.
29
29
30
30
## Future Directions
31
31
@@ -35,7 +35,7 @@ The `re'...'` syntax could be naturally extended to supporting "raw text" throug
35
35
36
36
In particular:
37
37
38
-
-`\` and `'` characters would become literal, e.g `re#''\n''#` expresses a regular expression pattern that literally matches against the characters `'\n'` (including the quotes).
38
+
-`\` and `'` characters would become literal, e.g `re#''\n''#` expresses a regular expression pattern that literally matches against the characters `'\n'` (including the quotes).**TODO: Do we really want to treat backslash as literal? Seems consistent, but escape sequences are frequently used in regex.**
39
39
- Any number of `#` characters may surround the literal.
40
40
- Escape sequences would require the same number of `#` characters as in the delimiter to be treated specially. For example, `re##'\##n'##` would be required for a newline character sequence.
41
41
@@ -70,17 +70,17 @@ Forward slashes are a regex term of art, and are used as the delimiters for rege
70
70
71
71
The obvious parsing ambiguity with `/.../` delimiters is with comment syntaxes.
72
72
73
-
- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and could be disallowed.
73
+
- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and can therefore be disallowed without significant impact.
74
74
75
75
- The obvious choice for a multi-line regular expression literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. A different multi-line delimiter would be needed, with no obvious choice.
76
76
77
-
- There is also a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
77
+
- There is a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
78
78
79
-
```swift
80
-
/*
81
-
let regex = /x*/
82
-
*/
83
-
```
79
+
```swift
80
+
/*
81
+
let regex = /x*/
82
+
*/
83
+
```
84
84
85
85
In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier.
86
86
@@ -90,7 +90,11 @@ let regex = /x*/
90
90
91
91
#### Regex limitations
92
92
93
-
Another ambiguity with `/.../` arises when it is used to start a new line. This is particularly problematic for result builders, where we expect it to be frequently used, for example:
93
+
In order to help avoid parsing ambiguities, a regex literal will not be parsed if it starts with a space, tab, or `)` character. Though the latter is already invalid regex syntax.
94
+
95
+
<details><summary>Rationale</summary>
96
+
97
+
This is due to 2 main ambiguities. The first of which arises when a `/.../` regex literal is used to start a new line. This is particularly problematic for result builders, where we expect it to be frequently used, for example:
94
98
95
99
```swift
96
100
Builder {
@@ -100,7 +104,7 @@ Builder {
100
104
}
101
105
```
102
106
103
-
This is parsed as a single operator chain, however it is likely the user is expecting a regex literal. To resolve this ambiguity, a regex literal may not start with a space or tab character. This takes advantage of the fact that infix operators require consistent spacing.
107
+
This is parsed as a single operator chain, however it is likely the user is expecting a regex literal. To resolve this ambiguity, a regex literal may not start with a space or tab character. This takes advantage of the fact that infix operators require consistent spacing on either side.
104
108
105
109
If a space or tab is needed as the first character, it must be escaped, e.g:
106
110
@@ -112,7 +116,27 @@ Builder {
112
116
}
113
117
```
114
118
115
-
**TODO: Regex starting with `)`**
119
+
The second ambiguity arises with Swift's ability to pass an unapplied operator reference as an argument to a function, for example:
120
+
121
+
```swift
122
+
let arr: [Double] = [2, 3, 4]
123
+
let x = arr.reduce(1, /) /5
124
+
```
125
+
126
+
The `/` in the call to `reduce` is in a valid expression context, and as such could be passed as a regular expression literal. To help mitigate this ambiguity, a regex literal will not be parsed if the first character is `)`. Note this would not be valid regex syntax anyway.
127
+
128
+
This is also applicable to unapplied operator references in parentheses and tuples.
129
+
130
+
It should be noted that this only mitigates the issue, as another ambiguity arises if the next character is a comma:
However we feel that starting a regex with a comma is likely to be a common case, and as such we intend to change the parser such that the above becomes a regex literal.
138
+
139
+
</details>
116
140
117
141
#### Language changes required
118
142
@@ -161,6 +185,8 @@ foo(/, /)
161
185
162
186
**TODO: Or do we want to ban it as the starting character?**
163
187
188
+
</details>
189
+
164
190
#### Editor Considerations
165
191
166
192
As described above, there would be a lot involved in handling the parsing ambiguities with `/.../` delimiters. It's one thing to do this in the compiler. But the language also has to be understood by a plethora of source code editors. Those editors either need encode all those ambiguities, or they need to provide a "best effort" at handling the most common cases. It's all too common for editors to take the "best effort" route. There's a long history of complaints with editors that don't completely support a language's features. And indeed, there's plenty of history of editors that don't correctly support regular expression literals in other languages. By choosing a literal that is easily parsed, we should avoid seeing those complaints regarding Swift.
0 commit comments