|
| 1 | +# Tagged Strings |
| 2 | + |
| 3 | +Authors: Bob Nystrom |
| 4 | + |
| 5 | +Status: **Draft** |
| 6 | + |
| 7 | +Summary: Use Dart's string literal syntax to create values of user-defined types |
| 8 | +by allowing an identifier before a string to identify a "tag processor" that |
| 9 | +controls how the string literal and its interpolated expressions are evaluted. |
| 10 | + |
| 11 | +## Motivation |
| 12 | + |
| 13 | +JavaScript has a feature called [tagged template literals][]. This proposal |
| 14 | +essentially brings that to Dart. Why is something like this useful? Here's one |
| 15 | +detailed example: |
| 16 | + |
| 17 | +[tagged template literals]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates |
| 18 | + |
| 19 | +### Code literals for macros |
| 20 | + |
| 21 | +The language team is currently investing adding [macros] to Dart. These macros |
| 22 | +are written in Dart and produce Dart code. This means we need some sort of API |
| 23 | +for constructing objects that represent pieces of Dart syntax. The best API for |
| 24 | +creating Dart syntax *is* Dart syntax. The obvious approach is to have users |
| 25 | +place that syntax in string literals and parse it: |
| 26 | + |
| 27 | +[macros]: https://github.com/dart-lang/language/blob/master/working/macros/feature-specification.md |
| 28 | + |
| 29 | +```dart |
| 30 | +var code = Code.parse('var n = 123;'); |
| 31 | +``` |
| 32 | + |
| 33 | +But macros may need to produce code objects for different parts of the Dart |
| 34 | +grammar—expressions, statements, declarations, etc. Dart's grammar uses |
| 35 | +the same syntax in different contexts to mean different things. For example: |
| 36 | + |
| 37 | +```dart |
| 38 | +var code = Code.parse('{}'); |
| 39 | +``` |
| 40 | + |
| 41 | +Does this create syntax for an empty map literal or an empty block statement? |
| 42 | +Without knowing where in the grammar the `{}` is intended to appear, there's |
| 43 | +no way to unambiguously parse it. The code creation API needs a way for users |
| 44 | +to specify what kind of grammar they are creating. We could expose multiple |
| 45 | +API entrypoints: |
| 46 | + |
| 47 | +```dart |
| 48 | +var map = Expression.parse('{}'); |
| 49 | +var block = Statement.parse('{}'); |
| 50 | +``` |
| 51 | + |
| 52 | +This works, but is verbose. We could get clever with extension getters: |
| 53 | + |
| 54 | +```dart |
| 55 | +var map = '{}'.expression; |
| 56 | +var block = '{}'.statement; |
| 57 | +``` |
| 58 | + |
| 59 | +This is shorter, but not exactly idiomatic. |
| 60 | + |
| 61 | +There is a bigger problem. Macros often compose code out of other pieces of |
| 62 | +syntax. For example: |
| 63 | + |
| 64 | +```dart |
| 65 | +var add = Expression.parse('2 + 3'); |
| 66 | +var multiply = Expression.parse('4 * $add'); |
| 67 | +``` |
| 68 | + |
| 69 | +Here, we are composing a binary multiplication out of `4` and another expression |
| 70 | +object. The intent is that `2 + 3` should become the right operand to the `*`. |
| 71 | +But the `4 * $add` string interpolation simply calls `toString()` on the operand |
| 72 | +and stuffs the result directly in, yielding `4 * 2 + 3`. |
| 73 | + |
| 74 | +We want macro authors to be able to easily compose syntax without having to |
| 75 | +worry about operator precedence, commas as separators, semicolons as |
| 76 | +terminators, etc. In other words, we want Dart string interpolation syntax to be |
| 77 | +user-programmable in the way that `for-in` loop syntax is. |
| 78 | + |
| 79 | +## Tagged strings |
| 80 | + |
| 81 | +A **tagged string** is a string literal prefixed with an identifier, like: |
| 82 | + |
| 83 | +```dart |
| 84 | +var add = expr '2 + 3'; |
| 85 | +var subtract = expr '7 - 5'; |
| 86 | +var multiply = expr '4 * $add / $subtract'; |
| 87 | +``` |
| 88 | + |
| 89 | +Here, the `expr` before each string marks that string as a tagged string. A |
| 90 | +tagged string is syntactic sugar for a call to a user-defined **tag processor** |
| 91 | +function that has control over how the string literal's string parts and |
| 92 | +interpolated expressions are evaluated and composed together. |
| 93 | + |
| 94 | +The above code is essentially seen by the compiler as: |
| 95 | + |
| 96 | +```dart |
| 97 | +var add = exprStringLiteral(['2 + 3'], []); |
| 98 | +var subtract = exprStringLiteral(['7 - 5'], []); |
| 99 | +var multiply = exprStringLiteral(['4 * ', ' / '], |
| 100 | + [() => add, () => subtract]); |
| 101 | +``` |
| 102 | + |
| 103 | +The literal text parts are pulled out into one list. The interpolated |
| 104 | +expressions are each wrapped in closures and put into a second list. Then these |
| 105 | +are passed to a function whose name is based on the tag identifier. Wrapping |
| 106 | +the interpolated expressions in closures gives the tag processor control over |
| 107 | +when or if the expressions are evaluated. |
| 108 | + |
| 109 | +Since the intent of this feature is brevity, we expect users to choose short tag |
| 110 | +names like `expr` here, `html`, `css`, etc. Since those names are likely to |
| 111 | +collide with other variables, the language implicitly appends `StringLiteral` to |
| 112 | +the tag name to determine the name of the tag processor. This lets users use |
| 113 | +short tag names without having to worry about name collisions. |
| 114 | + |
| 115 | +In the above example, those tagged strings could end up calling tag processor |
| 116 | +that looks something like: |
| 117 | + |
| 118 | +```dart |
| 119 | +Code exprStringLiteral( |
| 120 | + List<String> strings, |
| 121 | + List<Object? Function()) values) { |
| 122 | + var buffer = StringBuffer(); |
| 123 | + for (var i = 0; i < values.length; i++) { |
| 124 | + buffer.write(strings[i]); |
| 125 | + var value = values[i](); |
| 126 | + if (value is Expression) { |
| 127 | + buffer.write('(' + value.toSource() + ')'); |
| 128 | + } else { |
| 129 | + buffer.write(value); |
| 130 | + } |
| 131 | + } |
| 132 | +
|
| 133 | + buffer.write(strings.last); |
| 134 | + return Expression.parse(buffer.toString()); |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +Note that this toy implementation implicitly wraps values that are subexpressions in parentheses to avoid precedence errors. The interpolated expressions passed to a tag processor do not need to evaluate to strings. It's up to the processor to define which kinds of values are allowed. |
| 139 | + |
| 140 | +Note also that the tag handler does not have to *return* a string either. Here |
| 141 | +it returns `Code`. While tag strings are based on Dart string literal syntax, |
| 142 | +they can produce an object of any type the user wants. |
| 143 | + |
| 144 | +### Other uses |
| 145 | + |
| 146 | +The driving motivation for adding the feature now is so that we can make it |
| 147 | +more pleasant to author macros, but this is a general purpose Dart language |
| 148 | +feature that any Dart user can use. Some ideas: |
| 149 | + |
| 150 | +* An `html` API could be used to compose HTML out of pieces of strings while |
| 151 | + ensuring that the resulting string is correctly [sanitized][]. |
| 152 | + |
| 153 | +* An `sql` API could ensure that interpolated expressions are correctly quoted |
| 154 | + and escaped to avoid [SQL injection][]. |
| 155 | + |
| 156 | +* The [`BigInt`][bigint] class could expose a tag processor so that large |
| 157 | + integers can be created like: |
| 158 | + |
| 159 | + ```dart |
| 160 | + int '12345678901234567890' |
| 161 | + ``` |
| 162 | +
|
| 163 | + instead of: |
| 164 | +
|
| 165 | + ```dart |
| 166 | + BigInt.parse('12345678901234567890') |
| 167 | + ``` |
| 168 | +
|
| 169 | +* A logging framework could avoid evaluating the interpolated expressions |
| 170 | + entirely when logging is currently disabled in order to improve performance. |
| 171 | + When logging is enabled, it can catch exceptions thrown by the interpolated |
| 172 | + expressions to ensure that logging itself cannot crash the program. |
| 173 | +
|
| 174 | +* If tagged strings become used for embedded sub-languages like `html`, `css`, |
| 175 | + etc. Then Dart IDEs could potentially syntax highlight the contents of those |
| 176 | + strings according to their tagged language. |
| 177 | +
|
| 178 | +[sanitized]: https://en.wikipedia.org/wiki/HTML_sanitization |
| 179 | +[sql injection]: https://xkcd.com/327/ |
| 180 | +[bigint]: https://api.dart.dev/stable/2.14.4/dart-core/BigInt-class.html |
| 181 | +
|
| 182 | +## Grammar |
| 183 | +
|
| 184 | +The grammar requires a little adjusting because of raw and adjacent strings: |
| 185 | +
|
| 186 | +``` |
| 187 | +stringLiteral ::= |
| 188 | + taggedStringLiteral |
| 189 | + | ( multilineString |
| 190 | + | singleLineString |
| 191 | + | RAW_SINGLE_LINE_STRING |
| 192 | + | RAW_MULTI_LINE_STRING )+ |
| 193 | + |
| 194 | +taggedStringLiteral ::= identifier ( multilineString | singleLineString )+ |
| 195 | + |
| 196 | +singleLineString ::= // remove raw |
| 197 | + SINGLE_LINE_STRING_SQ_BEGIN_END |
| 198 | + | SINGLE_LINE_STRING_SQ_BEGIN_MID expression |
| 199 | + (SINGLE_LINE_STRING_SQ_MID_MID expression)* |
| 200 | + SINGLE_LINE_STRING_SQ_MID_END |
| 201 | + | SINGLE_LINE_STRING_DQ_BEGIN_END |
| 202 | + | SINGLE_LINE_STRING_DQ_BEGIN_MID expression |
| 203 | + (SINGLE_LINE_STRING_DQ_MID_MID expression)* |
| 204 | + SINGLE_LINE_STRING_DQ_MID_END |
| 205 | + |
| 206 | +multilineString ::= // remove raw |
| 207 | + MULTI_LINE_STRING_SQ_BEGIN_END |
| 208 | + | MULTI_LINE_STRING_SQ_BEGIN_MID expression |
| 209 | + (MULTI_LINE_STRING_SQ_MID_MID expression)* |
| 210 | + MULTI_LINE_STRING_SQ_MID_END |
| 211 | + | MULTI_LINE_STRING_DQ_BEGIN_END |
| 212 | + | MULTI_LINE_STRING_DQ_BEGIN_MID expression |
| 213 | + (MULTI_LINE_STRING_DQ_MID_MID expression)* |
| 214 | + MULTI_LINE_STRING_DQ_MID_END |
| 215 | +``` |
| 216 | +
|
| 217 | +Basically, a string literal can be a tagged string or an untagged string. A |
| 218 | +tagged string is an identifier followed by a series of non-raw untagged adjacent |
| 219 | +strings. An untagged string is a series of adjacent strings which may include |
| 220 | +raw strings. |
| 221 | +
|
| 222 | +If the identifier before a string literal is `r`, it is considered a raw string, |
| 223 | +not a string tagged with `r`. |
| 224 | +
|
| 225 | +## Static semantics |
| 226 | +
|
| 227 | +A tagged string is an identifier followed by a series of adjacent string |
| 228 | +literals which may contain interpolated expressions. This is treated as |
| 229 | +syntactic sugar for a function call with two list arguments. |
| 230 | +
|
| 231 | +### Desugaring |
| 232 | +
|
| 233 | +The tag identifier is suffixed with `StringLiteral` to determine the tag |
| 234 | +processor name. |
| 235 | +
|
| 236 | +Adjacent strings are implicitly concatenated into a single string as in current |
| 237 | +Dart. |
| 238 | +
|
| 239 | +The string is split into string parts and interpolation expressions. All of the |
| 240 | +string literal parts from the `SINGLE_LINE_*` and `MULTI_LINE_*` rules are |
| 241 | +collected in order and put in an object that implements `List<String>`. |
| 242 | +
|
| 243 | +Each `expression` is wrapped in a closure of type `Object? Function()` that |
| 244 | +evaluates and returns the expression when invoked. These closures are collected |
| 245 | +in order into an object that implements `List<Object? Function()>`. |
| 246 | +
|
| 247 | +**TODO: What if an interpolated expression uses `await`? We could implicitly |
| 248 | +make the function `async` in that case and require the template function to |
| 249 | +handle a future result. Or we could make it a compile-time error like we do |
| 250 | +when using `await` in the initializer of a `late` variable.** |
| 251 | +
|
| 252 | +The structure of the grammar is such that the list of string parts will always |
| 253 | +be one element longer than the list of expressions. If there are no expressions, |
| 254 | +there will be one string part. If an interpolated expression begins the string, |
| 255 | +there will be a zero-length initial string part. Likewise, if an interpolated |
| 256 | +expression ends the string, there will be a zero-length string part at the end |
| 257 | +of the parts list. Some examples: |
| 258 | +
|
| 259 | +```dart |
| 260 | +// string parts expressions |
| 261 | +tag '' // '' (none) |
| 262 | +tag 'str' // 'str' (none) |
| 263 | +tag '$e' // '', '' e |
| 264 | +tag '@$e' // '@', '' e |
| 265 | +tag '$e!' // '', '!' e |
| 266 | +tag '@$e!' // '@', '!' e |
| 267 | +tag '$e$f' // '', '', '' e, f |
| 268 | +tag '@$e#$f!' // '@', '#', '!' e, f |
| 269 | +``` |
| 270 | + |
| 271 | +The tagged string literal is replaced with a call to the tag processor function. |
| 272 | +The list of string parts and expressions (which may be empty) are passed to that |
| 273 | +function as positional arguments. |
| 274 | + |
| 275 | +### Static typing |
| 276 | + |
| 277 | +It is a compile-time error if: |
| 278 | + |
| 279 | +* The tag named suffixed with `StringLiteral` does not resolve to a function |
| 280 | + that can be called with two positional arguments. |
| 281 | +* `List<String>` cannot be assigned to the first parameter's type. |
| 282 | +* `List<Object? Function()>` cannot be assigned to the first parameter's type. |
| 283 | + |
| 284 | +The type of a tagged string literal expression is the return type of the |
| 285 | +corresponding tagged string literal function. |
| 286 | + |
| 287 | +## Runtime semantics |
| 288 | + |
| 289 | +This feature is purely syntactic sugar, so there are no runtime semantics |
| 290 | +beyond the behavior of the Dart code that the tagged string desugars to. |
0 commit comments