Skip to content

Commit c79f457

Browse files
committed
Expand on parsing issues with / as delimited. Add a note about editor support.
1 parent 381a3c7 commit c79f457

File tree

1 file changed

+28
-10
lines changed

1 file changed

+28
-10
lines changed

Documentation/Evolution/DelimiterSyntax.md

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Regular Expression Literal Delimiters
22

3-
- Authors: Hamish Knight, Michael Ilseman
3+
- Authors: Hamish Knight, Michael Ilseman, David Ewing
44

55
## Introduction
66

@@ -29,6 +29,7 @@ To do this, a heuristic will be used when lexing a regex literal, and will check
2929

3030
**TODO: Or do we want to insist on the user using raw `re#'...'#` syntax?**
3131

32+
3233
## Future Directions
3334

3435
### Raw literals
@@ -66,29 +67,29 @@ We could choose to shorten the literal prefix to just `r`. However this could po
6667

6768
### Forward slashes `/.../`
6869

69-
Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternative choices). However, they would be an awkward fit in Swift's language grammar, and would not provide a path for extensibility.
70+
Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternative choices). However, they would be an awkward fit in Swift's language grammar, and would not provide a path for extensibility. Here we give an extensive list of drawbacks to the choice. While no individual issue is terribly bad and each could be overcome, the list of issues is quite long.
7071

7172
#### Parsing ambiguities
7273

73-
The primary parsing ambiguity with `/.../` delimiters is with comment syntax.
74+
The obvious parsing ambiguity with `/.../` delimiters is with comment syntaxes.
7475

75-
An empty regex literal would conflict with line comment syntax `//`. While this isn't a particularly useful thing to express, it may lead to an awkward user typing experience. In particular, as you begin to type a regex literal, a comment could be formed before you start typing the contents. This could however be mitigated by source tooling.
76+
- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and could be disallowed.
7677

77-
Line comment syntax additionally means that a potential multi-line version of a regular expression literal would not be able to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`.
78+
- The obvious choice for a multi-line regular expression literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. A different multi-line delimiter would be needed, with no obvious choice.
7879

79-
There is also a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
80+
- There is also a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
8081

8182
```swift
8283
/*
8384
let regex = /x*/
8485
*/
8586
```
8687

87-
In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier.
88+
In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier.
8889

89-
Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax.
90+
- Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax.
9091

91-
Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
92+
- Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
9293

9394
#### Regex limitations
9495

@@ -145,11 +146,28 @@ let x = !/y / .foo()
145146
```
146147

147148
Otherwise it would be interpreted as the prefix operator `!/` by default, and require parens `!(/y /)` for regex parsing.
149+
150+
151+
152+
**TODO: More cases from slack discussion **
153+
154+
```swift
155+
func foo(_ x: (Int, Int) -> Int, _ y: (Int, Int) -> Int) {}
156+
foo(/, /)
157+
```
158+
159+
`foo(/, "(") / 2` !!!
160+
161+
148162

149163
##### Comma as the starting character of a regex literal
150164

151165
**TODO: Or do we want to ban it as the starting character?**
152-
166+
167+
#### Editor Considerations
168+
169+
Many source editors in use today do rather simplistic syntax coloring of programming languages. And there's a long history of complaints about syntax coloring of regular expressions in Perl, JavaScript and Ruby to be found on the internet. While the most popular editors do a very good job recognizing the most common incantations of a regular expression in each language, most still don't get it 100% right. There's just a lot of work involved in doing that. If parsing Swift regular expressions is as difficult as these other languages because of the choice of delimiter, it becomes a barrier to entry for support by those editors.
170+
153171
### Pound slash `#/.../#`
154172

155173
This would be less syntactically ambiguous than `/.../`, while retaining some of the term-of-art familiarity. It would also provide a natural path through which to introduce `/.../` in a new language mode, as users could drop the `#` characters once they upgrade.

0 commit comments

Comments
 (0)