|
1 | 1 | # Regular Expression Literal Delimiters
|
2 | 2 |
|
3 |
| -- Authors: Hamish Knight, Michael Ilseman |
| 3 | +- Authors: Hamish Knight, Michael Ilseman, David Ewing |
4 | 4 |
|
5 | 5 | ## Introduction
|
6 | 6 |
|
@@ -29,6 +29,7 @@ To do this, a heuristic will be used when lexing a regex literal, and will check
|
29 | 29 |
|
30 | 30 | **TODO: Or do we want to insist on the user using raw `re#'...'#` syntax?**
|
31 | 31 |
|
| 32 | + |
32 | 33 | ## Future Directions
|
33 | 34 |
|
34 | 35 | ### Raw literals
|
@@ -66,29 +67,29 @@ We could choose to shorten the literal prefix to just `r`. However this could po
|
66 | 67 |
|
67 | 68 | ### Forward slashes `/.../`
|
68 | 69 |
|
69 |
| -Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternative choices). However, they would be an awkward fit in Swift's language grammar, and would not provide a path for extensibility. |
| 70 | +Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternative choices). However, they would be an awkward fit in Swift's language grammar, and would not provide a path for extensibility. Here we give an extensive list of drawbacks to the choice. While no individual issue is terribly bad and each could be overcome, the list of issues is quite long. |
70 | 71 |
|
71 | 72 | #### Parsing ambiguities
|
72 | 73 |
|
73 |
| -The primary parsing ambiguity with `/.../` delimiters is with comment syntax. |
| 74 | +The obvious parsing ambiguity with `/.../` delimiters is with comment syntaxes. |
74 | 75 |
|
75 |
| -An empty regex literal would conflict with line comment syntax `//`. While this isn't a particularly useful thing to express, it may lead to an awkward user typing experience. In particular, as you begin to type a regex literal, a comment could be formed before you start typing the contents. This could however be mitigated by source tooling. |
| 76 | +- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and could be disallowed. |
76 | 77 |
|
77 |
| -Line comment syntax additionally means that a potential multi-line version of a regular expression literal would not be able to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. |
| 78 | +- The obvious choice for a multi-line regular expression literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. A different multi-line delimiter would be needed, with no obvious choice. |
78 | 79 |
|
79 |
| -There is also a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example: |
| 80 | +- There is also a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example: |
80 | 81 |
|
81 | 82 | ```swift
|
82 | 83 | /*
|
83 | 84 | let regex = /x*/
|
84 | 85 | */
|
85 | 86 | ```
|
86 | 87 |
|
87 |
| -In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier. |
| 88 | + In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier. |
88 | 89 |
|
89 |
| -Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax. |
| 90 | +- Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax. |
90 | 91 |
|
91 |
| -Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation. |
| 92 | +- Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation. |
92 | 93 |
|
93 | 94 | #### Regex limitations
|
94 | 95 |
|
@@ -145,11 +146,28 @@ let x = !/y / .foo()
|
145 | 146 | ```
|
146 | 147 |
|
147 | 148 | Otherwise it would be interpreted as the prefix operator `!/` by default, and require parens `!(/y /)` for regex parsing.
|
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | +**TODO: More cases from slack discussion ** |
| 153 | + |
| 154 | +```swift |
| 155 | +func foo(_ x: (Int, Int) -> Int, _ y: (Int, Int) -> Int) {} |
| 156 | +foo(/, /) |
| 157 | +``` |
| 158 | + |
| 159 | +`foo(/, "(") / 2` !!! |
| 160 | + |
| 161 | + |
148 | 162 |
|
149 | 163 | ##### Comma as the starting character of a regex literal
|
150 | 164 |
|
151 | 165 | **TODO: Or do we want to ban it as the starting character?**
|
152 |
| - |
| 166 | + |
| 167 | +#### Editor Considerations |
| 168 | + |
| 169 | +Many source editors in use today do rather simplistic syntax coloring of programming languages. And there's a long history of complaints about syntax coloring of regular expressions in Perl, JavaScript and Ruby to be found on the internet. While the most popular editors do a very good job recognizing the most common incantations of a regular expression in each language, most still don't get it 100% right. There's just a lot of work involved in doing that. If parsing Swift regular expressions is as difficult as these other languages because of the choice of delimiter, it becomes a barrier to entry for support by those editors. |
| 170 | + |
153 | 171 | ### Pound slash `#/.../#`
|
154 | 172 |
|
155 | 173 | This would be less syntactically ambiguous than `/.../`, while retaining some of the term-of-art familiarity. It would also provide a natural path through which to introduce `/.../` in a new language mode, as users could drop the `#` characters once they upgrade.
|
|
0 commit comments