Skip to content

Commit b0d2e07

Browse files
committed
Add a draft proposal for tagged strings.
1 parent cbe55c3 commit b0d2e07

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
# Tagged Strings
2+
3+
Authors: Bob Nystrom
4+
5+
Status: **Draft**
6+
7+
Summary: Use Dart's string literal syntax to create values of user-defined types
8+
by allowing an identifier before a string to identify a "tag processor" that
9+
controls how the string literal and its interpolated expressions are evaluted.
10+
11+
## Motivation
12+
13+
JavaScript has a feature called [tagged template literals][]. This proposal
14+
essentially brings that to Dart. Why is something like this useful? Here's one
15+
detailed example:
16+
17+
[tagged template literals]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates
18+
19+
### Code literals for macros
20+
21+
The language team is currently investing adding [macros] to Dart. These macros
22+
are written in Dart and produce Dart code. This means we need some sort of API
23+
for constructing objects that represent pieces of Dart syntax. The best API for
24+
creating Dart syntax *is* Dart syntax. The obvious approach is to have users
25+
place that syntax in string literals and parse it:
26+
27+
[macros]: https://github.com/dart-lang/language/blob/master/working/macros/feature-specification.md
28+
29+
```dart
30+
var code = Code.parse('var n = 123;');
31+
```
32+
33+
But macros may need to produce code objects for different parts of the Dart
34+
grammar—expressions, statements, declarations, etc. Dart's grammar uses
35+
the same syntax in different contexts to mean different things. For example:
36+
37+
```dart
38+
var code = Code.parse('{}');
39+
```
40+
41+
Does this create syntax for an empty map literal or an empty block statement?
42+
Without knowing where in the grammar the `{}` is intended to appear, there's
43+
no way to unambiguously parse it. The code creation API needs a way for users
44+
to specify what kind of grammar they are creating. We could expose multiple
45+
API entrypoints:
46+
47+
```dart
48+
var map = Expression.parse('{}');
49+
var block = Statement.parse('{}');
50+
```
51+
52+
This works, but is verbose. We could get clever with extension getters:
53+
54+
```dart
55+
var map = '{}'.expression;
56+
var block = '{}'.statement;
57+
```
58+
59+
This is shorter, but not exactly idiomatic.
60+
61+
There is a bigger problem. Macros often compose code out of other pieces of
62+
syntax. For example:
63+
64+
```dart
65+
var add = Expression.parse('2 + 3');
66+
var multiply = Expression.parse('4 * $add');
67+
```
68+
69+
Here, we are composing a binary multiplication out of `4` and another expression
70+
object. The intent is that `2 + 3` should become the right operand to the `*`.
71+
But the `4 * $add` string interpolation simply calls `toString()` on the operand
72+
and stuffs the result directly in, yielding `4 * 2 + 3`.
73+
74+
We want macro authors to be able to easily compose syntax without having to
75+
worry about operator precedence, commas as separators, semicolons as
76+
terminators, etc. In other words, we want Dart string interpolation syntax to be
77+
user-programmable in the way that `for-in` loop syntax is.
78+
79+
## Tagged strings
80+
81+
A **tagged string** is a string literal prefixed with an identifier, like:
82+
83+
```dart
84+
var add = expr '2 + 3';
85+
var subtract = expr '7 - 5';
86+
var multiply = expr '4 * $add / $subtract';
87+
```
88+
89+
Here, the `expr` before each string marks that string as a tagged string. A
90+
tagged string is syntactic sugar for a call to a user-defined **tag processor**
91+
function that has control over how the string literal's string parts and
92+
interpolated expressions are evaluated and composed together.
93+
94+
The above code is essentially seen by the compiler as:
95+
96+
```dart
97+
var add = exprStringLiteral(['2 + 3'], []);
98+
var subtract = exprStringLiteral(['7 - 5'], []);
99+
var multiply = exprStringLiteral(['4 * ', ' / '],
100+
[() => add, () => subtract]);
101+
```
102+
103+
The literal text parts are pulled out into one list. The interpolated
104+
expressions are each wrapped in closures and put into a second list. Then these
105+
are passed to a function whose name is based on the tag identifier. Wrapping
106+
the interpolated expressions in closures gives the tag processor control over
107+
when or if the expressions are evaluated.
108+
109+
Since the intent of this feature is brevity, we expect users to choose short tag
110+
names like `expr` here, `html`, `css`, etc. Since those names are likely to
111+
collide with other variables, the language implicitly appends `StringLiteral` to
112+
the tag name to determine the name of the tag processor. This lets users use
113+
short tag names without having to worry about name collisions.
114+
115+
In the above example, those tagged strings could end up calling tag processor
116+
that looks something like:
117+
118+
```dart
119+
Code exprStringLiteral(
120+
List<String> strings,
121+
List<Object? Function()) values) {
122+
var buffer = StringBuffer();
123+
for (var i = 0; i < values.length; i++) {
124+
buffer.write(strings[i]);
125+
var value = values[i]();
126+
if (value is Expression) {
127+
buffer.write('(' + value.toSource() + ')');
128+
} else {
129+
buffer.write(value);
130+
}
131+
}
132+
133+
buffer.write(strings.last);
134+
return Expression.parse(buffer.toString());
135+
}
136+
```
137+
138+
Note that this toy implementation implicitly wraps values that are subexpressions in parentheses to avoid precedence errors. The interpolated expressions passed to a tag processor do not need to evaluate to strings. It's up to the processor to define which kinds of values are allowed.
139+
140+
Note also that the tag handler does not have to *return* a string either. Here
141+
it returns `Code`. While tag strings are based on Dart string literal syntax,
142+
they can produce an object of any type the user wants.
143+
144+
### Other uses
145+
146+
The driving motivation for adding the feature now is so that we can make it
147+
more pleasant to author macros, but this is a general purpose Dart language
148+
feature that any Dart user can use. Some ideas:
149+
150+
* An `html` API could be used to compose HTML out of pieces of strings while
151+
ensuring that the resulting string is correctly [sanitized][].
152+
153+
* An `sql` API could ensure that interpolated expressions are correctly quoted
154+
and escaped to avoid [SQL injection][].
155+
156+
* The [`BigInt`][bigint] class could expose a tag processor so that large
157+
integers can be created like:
158+
159+
```dart
160+
int '12345678901234567890'
161+
```
162+
163+
instead of:
164+
165+
```dart
166+
BigInt.parse('12345678901234567890')
167+
```
168+
169+
* A logging framework could avoid evaluating the interpolated expressions
170+
entirely when logging is currently disabled in order to improve performance.
171+
When logging is enabled, it can catch exceptions thrown by the interpolated
172+
expressions to ensure that logging itself cannot crash the program.
173+
174+
* If tagged strings become used for embedded sub-languages like `html`, `css`,
175+
etc. Then Dart IDEs could potentially syntax highlight the contents of those
176+
strings according to their tagged language.
177+
178+
[sanitized]: https://en.wikipedia.org/wiki/HTML_sanitization
179+
[sql injection]: https://xkcd.com/327/
180+
[bigint]: https://api.dart.dev/stable/2.14.4/dart-core/BigInt-class.html
181+
182+
## Grammar
183+
184+
The grammar requires a little adjusting because of raw and adjacent strings:
185+
186+
```
187+
stringLiteral ::=
188+
taggedStringLiteral
189+
| ( multilineString
190+
| singleLineString
191+
| RAW_SINGLE_LINE_STRING
192+
| RAW_MULTI_LINE_STRING )+
193+
194+
taggedStringLiteral ::= identifier ( multilineString | singleLineString )+
195+
196+
singleLineString ::= // remove raw
197+
SINGLE_LINE_STRING_SQ_BEGIN_END
198+
| SINGLE_LINE_STRING_SQ_BEGIN_MID expression
199+
(SINGLE_LINE_STRING_SQ_MID_MID expression)*
200+
SINGLE_LINE_STRING_SQ_MID_END
201+
| SINGLE_LINE_STRING_DQ_BEGIN_END
202+
| SINGLE_LINE_STRING_DQ_BEGIN_MID expression
203+
(SINGLE_LINE_STRING_DQ_MID_MID expression)*
204+
SINGLE_LINE_STRING_DQ_MID_END
205+
206+
multilineString ::= // remove raw
207+
MULTI_LINE_STRING_SQ_BEGIN_END
208+
| MULTI_LINE_STRING_SQ_BEGIN_MID expression
209+
(MULTI_LINE_STRING_SQ_MID_MID expression)*
210+
MULTI_LINE_STRING_SQ_MID_END
211+
| MULTI_LINE_STRING_DQ_BEGIN_END
212+
| MULTI_LINE_STRING_DQ_BEGIN_MID expression
213+
(MULTI_LINE_STRING_DQ_MID_MID expression)*
214+
MULTI_LINE_STRING_DQ_MID_END
215+
```
216+
217+
Basically, a string literal can be a tagged string or an untagged string. A
218+
tagged string is an identifier followed by a series of non-raw untagged adjacent
219+
strings. An untagged string is a series of adjacent strings which may include
220+
raw strings.
221+
222+
If the identifier before a string literal is `r`, it is considered a raw string,
223+
not a string tagged with `r`.
224+
225+
## Static semantics
226+
227+
A tagged string is an identifier followed by a series of adjacent string
228+
literals which may contain interpolated expressions. This is treated as
229+
syntactic sugar for a function call with two list arguments.
230+
231+
### Desugaring
232+
233+
The tag identifier is suffixed with `StringLiteral` to determine the tag
234+
processor name.
235+
236+
Adjacent strings are implicitly concatenated into a single string as in current
237+
Dart.
238+
239+
The string is split into string parts and interpolation expressions. All of the
240+
string literal parts from the `SINGLE_LINE_*` and `MULTI_LINE_*` rules are
241+
collected in order and put in an object that implements `List<String>`.
242+
243+
Each `expression` is wrapped in a closure of type `Object? Function()` that
244+
evaluates and returns the expression when invoked. These closures are collected
245+
in order into an object that implements `List<Object? Function()>`.
246+
247+
**TODO: What if an interpolated expression uses `await`? We could implicitly
248+
make the function `async` in that case and require the template function to
249+
handle a future result. Or we could make it a compile-time error like we do
250+
when using `await` in the initializer of a `late` variable.**
251+
252+
The structure of the grammar is such that the list of string parts will always
253+
be one element longer than the list of expressions. If there are no expressions,
254+
there will be one string part. If an interpolated expression begins the string,
255+
there will be a zero-length initial string part. Likewise, if an interpolated
256+
expression ends the string, there will be a zero-length string part at the end
257+
of the parts list. Some examples:
258+
259+
```dart
260+
// string parts expressions
261+
tag '' // '' (none)
262+
tag 'str' // 'str' (none)
263+
tag '$e' // '', '' e
264+
tag '@$e' // '@', '' e
265+
tag '$e!' // '', '!' e
266+
tag '@$e!' // '@', '!' e
267+
tag '$e$f' // '', '', '' e, f
268+
tag '@$e#$f!' // '@', '#', '!' e, f
269+
```
270+
271+
The tagged string literal is replaced with a call to the tag processor function.
272+
The list of string parts and expressions (which may be empty) are passed to that
273+
function as positional arguments.
274+
275+
### Static typing
276+
277+
It is a compile-time error if:
278+
279+
* The tag named suffixed with `StringLiteral` does not resolve to a function
280+
that can be called with two positional arguments.
281+
* `List<String>` cannot be assigned to the first parameter's type.
282+
* `List<Object? Function()>` cannot be assigned to the first parameter's type.
283+
284+
The type of a tagged string literal expression is the return type of the
285+
corresponding tagged string literal function.
286+
287+
## Runtime semantics
288+
289+
This feature is purely syntactic sugar, so there are no runtime semantics
290+
beyond the behavior of the Dart code that the tagged string desugars to.

0 commit comments

Comments
 (0)