You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/DelimiterSyntax.md
+30-13Lines changed: 30 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,11 +20,11 @@ let regex = try! Regex(compiling: pattern)
20
20
// regex: Regex<AnyRegexOutput>
21
21
```
22
22
23
-
The ability to compile regex patterns at runtime is useful for cases where it is e.g provided as user input, however it is suboptimal when the pattern is statically known for a number of reasons:
23
+
The ability to compile regex patterns at run time is useful for cases where it is e.g provided as user input, however it is suboptimal when the pattern is statically known for a number of reasons:
24
24
25
-
- Regex syntax errors aren't detected until runtime, and explicit error handling (e.g `try!`) is required to deal with these errors.
25
+
- Regex syntax errors aren't detected until run time, and explicit error handling (e.g `try!`) is required to deal with these errors.
26
26
- No special source tooling support, such as syntactic highlighting, code completion, and refactoring support, is available.
27
-
- Capture types aren't known until runtime, and as such a dynamic `AnyRegexOutput` capture type must be used.
27
+
- Capture types aren't known until run time, and as such a dynamic `AnyRegexOutput` capture type must be used.
28
28
- The syntax is overly verbose, especially for e.g an argument to a matching function.
29
29
30
30
## Proposed solution
@@ -39,7 +39,7 @@ let regex = /(?<identifier>[[:alpha:]]\w*) = (?<hex>[0-9A-F]+)/
39
39
40
40
Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternatives). Their ubiquity and familiarity makes them a compelling choice for Swift.
41
41
42
-
A regex literal may also be spelled using an extended syntax `#/.../#`, which allows the placement of an arbitrary number of balanced `#` characters around a regex literal. This syntax allows regex literals to contain unescaped forward slashes, and may be used without needing to upgrade to a new language mode.
42
+
A regex literal may also be spelled using an extended syntax `#/.../#`, which allows the placement of an arbitrary number of balanced `#` characters around a regex literal. This syntax allows regex literals to contain unescaped forward slashes, and may be used without needing to upgrade to a new language mode. This syntax further allows a multi-line mode when the opening delimiter is followed by a new line.
43
43
44
44
Within a regex literal, the compiler will parse the regex syntax outlined in in [the Regex Syntax pitch][internal-syntax], and diagnose any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Using a literal allows editors to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see [Regex builder DSL][regex-dsl]).
45
45
@@ -86,7 +86,7 @@ Unnamed capture groups produce unlabeled tuple elements and must be referenced b
86
86
87
87
### Extended delimiters `#/.../#`, `##/.../##`
88
88
89
-
A regex literal may be surrounded by an arbitrary number of balanced pound characters. This is a similar to raw string literal syntax introduced by [SE-0200], and allows a regex literal to use forward slashes without the need to escape them, e.g:
89
+
A regex literal may be surrounded by an arbitrary number of balanced pound characters. This is a somewhat similar to the raw string literal syntax introduced by [SE-0200], and allows a regex literal to use forward slashes without the need to escape them, e.g:
90
90
91
91
```swift
92
92
let regex =#/usr/lib/modules/([^/]+)/vmlinuz/#
@@ -99,7 +99,7 @@ Additionally, this syntax provides a way to write a regex literal without needin
99
99
100
100
This syntax differs from raw string literals `#"..."#` in that it does not treat backslashes as literal within the regex. A string literal `#"\n"#` represents the literal characters `\n`. However a regex literal `#/\n/#` remains a newline escape sequence.
101
101
102
-
One of the primary motivations behind this escaping behavior in raw string literals is that it allows the contents to be easily transportable to/from e.g external files where escaping is unnecessary. For string literals, this suggests that backslashes be treated as literal by default. For regex literals however, it instead suggests that backslashes should retain their semantic meaning, as it enables interoperability with regexes taken from outside your code without having to adjust escape sequences to match the delimiters used.
102
+
One of the primary motivations behind this escaping behavior in raw string literals is that it allows the contents to be easily transportable to/from e.g external files where escaping is unnecessary. For string literals, this suggests that backslashes be treated as literal by default. For regex literals however, it instead suggests that backslashes should retain their semantic meaning. This enables interoperability with regexes taken from outside your code without having to adjust escape sequences to match the delimiters used.
103
103
104
104
With string literals, escaping can be tricky without the use of raw syntax, as backslashes may have semantic meaning to the consumer, rather than the compiler. For example:
105
105
@@ -117,6 +117,24 @@ let regex = /\\\w\s*=\s*\d+/
117
117
118
118
Backslashes still require escaping to be treated as literal, however we don't expect this to be as common of an occurrence as needing to write a regex escape sequence such as `\s`, `\w`, or `\p{...}`, within a regex literal with extended delimiters `#/.../#`.
119
119
120
+
#### Multi-line mode
121
+
122
+
Extended regex delimiters additionally support a multi-line mode when the opening delimiter is followed by a new line. For example:
123
+
124
+
```swift
125
+
let regex =#/
126
+
# Match a line of the format e.g "DEBIT 03/03/2022 Totally Legit Shell Corp $2,000,000.00"
127
+
(?<kind>\w+) \s\s+
128
+
(?<date>\S+) \s\s+
129
+
(?<account> (?: (?!\s\s) . )+) \s\s+# Note that account names may contain spaces.
130
+
(?<amount>.*)
131
+
/#
132
+
```
133
+
134
+
In this mode, [extended regex syntax][extended-regex-syntax]`(?x)` is enabled by default. This means that whitespace becomes non-semantic, and end-of-line comments are supported with `# comment` syntax.
135
+
136
+
This mode is supported with any (non-zero) number of pound characters in the delimiter. Similar to multi-line strings introduced by [SE-0168], the closing delimiter must appear on a new line. To avoid parsing confusion, such a literal will not be parsed if a closing delimiter is not present. This avoids inadvertently treating the rest of the file as regex if you only type the opening.
137
+
120
138
### Ambiguities with comment syntax
121
139
122
140
Perhaps the most obvious parsing ambiguity with `/.../` delimiters is with comment syntax.
@@ -266,13 +284,7 @@ This takes advantage of the fact that a regex literal will not be parsed if the
266
284
267
285
## Future Directions
268
286
269
-
### Multi-line literals
270
-
271
-
The obvious choice for a multi-line regex literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. But this signifies a (documentation) comment, so a different multi-line delimiter would be needed, with no obvious choice. However, it's not clear that we need multi-line regex literals. The existing literals can be used inside a regex builder DSL.
272
-
273
-
### Regex extended syntax
274
-
275
-
Allowing non-semantic whitespace and other features of the extended syntax would be highly desired, with no obvious choice for a literal. Perhaps the need is also lessened by the ability to use regex literals inside the regex builder DSL.
287
+
**TODO: Do we have any other future directions now that extended multi-line syntax has been subsumed?**
276
288
277
289
## Alternatives Considered
278
290
@@ -319,6 +331,10 @@ It should also be noted that `#regex(...)` would introduce a syntactic inconsist
319
331
320
332
We could reduce the visual weight of `#regex(...)` by only requiring `#(...)`. However it would still retain the same issues, such as still looking potentially visually noisy as an argument, and having suboptimal behavior for parenthesis balancing. It is also not clear why regex literals would deserve such privileged syntax.
321
333
334
+
### Using a different delimiter for multi-line
335
+
336
+
Instead of re-using the extended delimiter syntax `#/.../#` for multi-line regex literals, we could choose a different delimiter for it. Unfortunately, the obvious choice for a multi-line regex literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. This signifies a (documentation) comment, and as such would not be viable.
337
+
322
338
### Reusing string literal syntax
323
339
324
340
Instead of supporting a first-class literal kind for regex, we could instead allow users to write a regex in a string literal, and parse, diagnose, and generate the appropriate code when it's coerced to the `Regex` type.
@@ -352,3 +368,4 @@ We therefore feel this would be a much less compelling feature without first cla
0 commit comments