|
| 1 | +- Feature Name: code_literals |
| 2 | +- Start Date: 2023-06-18 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Add a new kind of multi-line string literal for embedding code which |
| 10 | +plays nicely with `rustfmt`. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | + - Embedding code as a literal string within a Rust program is often |
| 16 | + necessary. A prominent example is the `sqlx` crate, which |
| 17 | + has the user write SQL queries as string literals within the program. |
| 18 | + - Rust already supports several kinds of multi-line string literal, |
| 19 | + but none of them are well suited for embedding code. |
| 20 | + |
| 21 | + 1. Normal string literals, eg. `"a string literal"`. These can be |
| 22 | + written over multiple lines, but require special characters |
| 23 | + to be escaped. Whitespace is significant within the literal, |
| 24 | + which means that `rustfmt` cannot fix the indentation of the |
| 25 | + code block. For example, beginning with this code: |
| 26 | + |
| 27 | + ```rust |
| 28 | + if some_condition { |
| 29 | + do_something_with( |
| 30 | + " |
| 31 | + a nicely |
| 32 | + indented code |
| 33 | + string |
| 34 | + " |
| 35 | + ); |
| 36 | + } |
| 37 | + ``` |
| 38 | + |
| 39 | + If the indentation is changed, such as by removing the |
| 40 | + conditional, then `rustfmt` must re-format the code like so: |
| 41 | + |
| 42 | + ```rust |
| 43 | + do_something_with( |
| 44 | + " |
| 45 | + a nicely |
| 46 | + indented code |
| 47 | + string |
| 48 | + " |
| 49 | + ); |
| 50 | + ``` |
| 51 | + |
| 52 | + To do otherwise would be to change thange the value of |
| 53 | + the string literal. |
| 54 | + |
| 55 | + 2. Normal string literals with backslash escaping, eg. |
| 56 | + ```rust |
| 57 | + " |
| 58 | + this way\ |
| 59 | + whitespace at\ |
| 60 | + the beginning\ |
| 61 | + of lines can\ |
| 62 | + be ignored\ |
| 63 | + " |
| 64 | + ``` |
| 65 | + |
| 66 | + This approach still suffers from the need to escape special |
| 67 | + characters. The backslashes at the end of every line are |
| 68 | + tedious to write, and are problematic if whitespace is |
| 69 | + meaningful within the code. For example, if python code |
| 70 | + was being embedded, then the indentation would be lost. |
| 71 | + Finally, although `rustfmt` could in principle reformat |
| 72 | + these strings, in practice doing so in a reasonable way |
| 73 | + is complicated and so this has never been enabled. |
| 74 | + |
| 75 | + 3. Raw string literals, eg. `r#"I can use "s!"#` |
| 76 | + |
| 77 | + This solves the problem of special characters, but suffers |
| 78 | + from the same inability to be reformatted, and the trick |
| 79 | + of using an `\` at the end of each line cannot be applied |
| 80 | + because escape characters are not recognised. |
| 81 | + |
| 82 | +# Guide-level explanation |
| 83 | +[guide-level-explanation]: #guide-level-explanation |
| 84 | + |
| 85 | +In addition to string literals and raw string literals, a third type |
| 86 | +of string literal exists: code string literals. |
| 87 | + |
| 88 | +```rust |
| 89 | + let code = ``` |
| 90 | + This is a code string literal |
| 91 | + |
| 92 | + I can use special characters like "" and \ freely. |
| 93 | + |
| 94 | + Indentation is preserved *relative* to the indentation level |
| 95 | + of the first line. |
| 96 | + |
| 97 | + It is an error for a line to have "negative" indentation (ie. be |
| 98 | + indented less than the indentation of the opening backticks) unless |
| 99 | + the line is empty. |
| 100 | + ```; |
| 101 | +``` |
| 102 | + |
| 103 | +`rustfmt` will automatically adjust the indentation of the code string |
| 104 | +literal as a whole to match the surrounding context, but will never |
| 105 | +change the relative indentation within such a literal. |
| 106 | + |
| 107 | +Anything directly after the opening backticks is not considered |
| 108 | +part of the string literal. It may be used as a language hint or |
| 109 | +processed by macros (similar to the treatment of doc comments). |
| 110 | + |
| 111 | +```rust |
| 112 | +let sql = ```sql |
| 113 | + SELECT * FROM table; |
| 114 | + ```; |
| 115 | +``` |
| 116 | + |
| 117 | +Similar to raw string literals, there is no way to escape characters |
| 118 | +within a code string literal. It is expected that procedural macros |
| 119 | +would build upon code string literals to add support for such |
| 120 | +functionality as required. |
| 121 | + |
| 122 | +If it is necessary to include triple backticks within a code string |
| 123 | +literal, more than three backticks may be used to enclose the |
| 124 | +literal, eg. |
| 125 | + |
| 126 | +```rust |
| 127 | +let code = ```` |
| 128 | + ``` |
| 129 | +````; |
| 130 | +``` |
| 131 | + |
| 132 | +# Reference-level explanation |
| 133 | +[reference-level-explanation]: #reference-level-explanation |
| 134 | + |
| 135 | +A code string literal will begin and end with three or more backticks. |
| 136 | +The number of backticks in the terminator must match the number used |
| 137 | +to begin the literal. |
| 138 | + |
| 139 | +The value of the string literal will be determined using the following |
| 140 | +steps: |
| 141 | + |
| 142 | +1. Start from the first newline after the opening backticks. |
| 143 | +2. Take the string exactly as written until the closing backticks. |
| 144 | +3. Remove equal numbers of spaces or tabs from every non-empty line |
| 145 | + until the first character of the first non-empty line is neither |
| 146 | + a space nor a tab, or until every line is empty. |
| 147 | + Raise a compile error if this could not be done |
| 148 | + due to a "negative" indent or inconsistent whitespace (eg. if |
| 149 | + some lines are indented using tabs and some using spaces). |
| 150 | + |
| 151 | +Here are some edge case examples: |
| 152 | + |
| 153 | +```rust |
| 154 | + // Empty string |
| 155 | + assert_eq!(```foo |
| 156 | + ```, ""); |
| 157 | + |
| 158 | + // Newline |
| 159 | + assert_eq!(``` |
| 160 | + |
| 161 | + ```, "\n"); |
| 162 | + |
| 163 | + // No terminating newline |
| 164 | + assert_eq!(``` |
| 165 | + bar```, "bar"); |
| 166 | + |
| 167 | + // Terminating newline |
| 168 | + assert_eq!(``` |
| 169 | + bar |
| 170 | + ```, "bar\n"); |
| 171 | + |
| 172 | + // Preserved indent |
| 173 | + assert_eq!(``` |
| 174 | + if a: |
| 175 | + print(42) |
| 176 | + ```, "if a:\n print(42)\n"); |
| 177 | + |
| 178 | + // Relative indent |
| 179 | + assert_eq!(``` |
| 180 | + if a: |
| 181 | + print(42) |
| 182 | + ```, "if a:\n print(42)\n"); |
| 183 | + |
| 184 | + // Relative to first non-empty line |
| 185 | + assert_eq!(``` |
| 186 | + |
| 187 | + |
| 188 | + if a: |
| 189 | + print(42) |
| 190 | + ```, "\n\nif a:\n print(42)\n"); |
| 191 | +``` |
| 192 | + |
| 193 | +The text between the opening backticks and the first newline is |
| 194 | +preserved within the AST, but is otherwise unused. |
| 195 | + |
| 196 | +# Drawbacks |
| 197 | +[drawbacks]: #drawbacks |
| 198 | + |
| 199 | +The main drawback is increased complexity of the language: |
| 200 | + |
| 201 | +1. It adds a new symbol to the language, which was not previously used. |
| 202 | +2. It adds a third way of writing string literals. |
| 203 | + |
| 204 | +# Rationale and alternatives |
| 205 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 206 | + |
| 207 | +There is lots of room to bike-shed syntax. |
| 208 | +If there is significant opposition to the backtick syntax, then an |
| 209 | +alternative syntax such as: |
| 210 | +``` |
| 211 | +code" |
| 212 | + string |
| 213 | +" |
| 214 | +``` |
| 215 | +could be used. |
| 216 | + |
| 217 | +Similarly, the use of more than three backticks may be unpopular. |
| 218 | +It's not clear how important it is to be able to nest backticks |
| 219 | +within backticks, but a syntax mirroring raw string literals could |
| 220 | +be used instead, eg. |
| 221 | +``` |
| 222 | +`# foo |
| 223 | + string |
| 224 | +#` |
| 225 | +``` |
| 226 | + |
| 227 | +There is also the question of whether the backtick syntax would |
| 228 | +interfere with the ability to paste Rust code snippets into such |
| 229 | +blocks. Experimentally, markdown parsers do not seem to have any |
| 230 | +problems with this (as demonstrated in this document). |
| 231 | + |
| 232 | +# Prior art |
| 233 | +[prior-art]: #prior-art |
| 234 | + |
| 235 | +The proposed syntax is primarily based on markdown code block syntax, |
| 236 | +which is widely used and should be familiar to most programmers. |
| 237 | + |
| 238 | + |
| 239 | +# Unresolved questions |
| 240 | +[unresolved-questions]: #unresolved-questions |
| 241 | + |
| 242 | +- None |
| 243 | + |
| 244 | +# Future possibilities |
| 245 | +[future-possibilities]: #future-possibilities |
| 246 | + |
| 247 | +- Macro authors could perform further processing |
| 248 | + on code string literals. These macros could add support for string |
| 249 | + interpolation, escaping, etc. without needing to further complicate |
| 250 | + the language itself. |
| 251 | + |
| 252 | +- Procedural macros could look at the text following the opening triple |
| 253 | + quotes and use that to influence code generation, eg. |
| 254 | + |
| 255 | + ```rust |
| 256 | + query!(```postgresql |
| 257 | + <query> |
| 258 | + ```) |
| 259 | + ``` |
| 260 | + |
| 261 | + could parse the query in a PostgreSQL specific way. |
| 262 | + |
| 263 | +- Code literals could be used by crates like `html-macro` |
| 264 | + or `quote` to provide better surface syntax and faster |
| 265 | + compilation. |
| 266 | + |
| 267 | +- Code literals could be used with the `asm!` macro to avoid |
| 268 | + needing a new string on every line. |
0 commit comments