Skip to content

Commit cc7e7c8

Browse files
committed
Propose code string literals
1 parent c1c42c3 commit cc7e7c8

File tree

1 file changed

+268
-0
lines changed

1 file changed

+268
-0
lines changed

text/0000-code-literals.md

Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
- Feature Name: code_literals
2+
- Start Date: 2023-06-18
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Add a new kind of multi-line string literal for embedding code which
10+
plays nicely with `rustfmt`.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
- Embedding code as a literal string within a Rust program is often
16+
necessary. A prominent example is the `sqlx` crate, which
17+
has the user write SQL queries as string literals within the program.
18+
- Rust already supports several kinds of multi-line string literal,
19+
but none of them are well suited for embedding code.
20+
21+
1. Normal string literals, eg. `"a string literal"`. These can be
22+
written over multiple lines, but require special characters
23+
to be escaped. Whitespace is significant within the literal,
24+
which means that `rustfmt` cannot fix the indentation of the
25+
code block. For example, beginning with this code:
26+
27+
```rust
28+
if some_condition {
29+
do_something_with(
30+
"
31+
a nicely
32+
indented code
33+
string
34+
"
35+
);
36+
}
37+
```
38+
39+
If the indentation is changed, such as by removing the
40+
conditional, then `rustfmt` must re-format the code like so:
41+
42+
```rust
43+
do_something_with(
44+
"
45+
a nicely
46+
indented code
47+
string
48+
"
49+
);
50+
```
51+
52+
To do otherwise would be to change thange the value of
53+
the string literal.
54+
55+
2. Normal string literals with backslash escaping, eg.
56+
```rust
57+
"
58+
this way\
59+
whitespace at\
60+
the beginning\
61+
of lines can\
62+
be ignored\
63+
"
64+
```
65+
66+
This approach still suffers from the need to escape special
67+
characters. The backslashes at the end of every line are
68+
tedious to write, and are problematic if whitespace is
69+
meaningful within the code. For example, if python code
70+
was being embedded, then the indentation would be lost.
71+
Finally, although `rustfmt` could in principle reformat
72+
these strings, in practice doing so in a reasonable way
73+
is complicated and so this has never been enabled.
74+
75+
3. Raw string literals, eg. `r#"I can use "s!"#`
76+
77+
This solves the problem of special characters, but suffers
78+
from the same inability to be reformatted, and the trick
79+
of using an `\` at the end of each line cannot be applied
80+
because escape characters are not recognised.
81+
82+
# Guide-level explanation
83+
[guide-level-explanation]: #guide-level-explanation
84+
85+
In addition to string literals and raw string literals, a third type
86+
of string literal exists: code string literals.
87+
88+
```rust
89+
let code = ```
90+
This is a code string literal
91+
92+
I can use special characters like "" and \ freely.
93+
94+
Indentation is preserved *relative* to the indentation level
95+
of the first line.
96+
97+
It is an error for a line to have "negative" indentation (ie. be
98+
indented less than the indentation of the opening backticks) unless
99+
the line is empty.
100+
```;
101+
```
102+
103+
`rustfmt` will automatically adjust the indentation of the code string
104+
literal as a whole to match the surrounding context, but will never
105+
change the relative indentation within such a literal.
106+
107+
Anything directly after the opening backticks is not considered
108+
part of the string literal. It may be used as a language hint or
109+
processed by macros (similar to the treatment of doc comments).
110+
111+
```rust
112+
let sql = ```sql
113+
SELECT * FROM table;
114+
```;
115+
```
116+
117+
Similar to raw string literals, there is no way to escape characters
118+
within a code string literal. It is expected that procedural macros
119+
would build upon code string literals to add support for such
120+
functionality as required.
121+
122+
If it is necessary to include triple backticks within a code string
123+
literal, more than three backticks may be used to enclose the
124+
literal, eg.
125+
126+
```rust
127+
let code = ````
128+
```
129+
````;
130+
```
131+
132+
# Reference-level explanation
133+
[reference-level-explanation]: #reference-level-explanation
134+
135+
A code string literal will begin and end with three or more backticks.
136+
The number of backticks in the terminator must match the number used
137+
to begin the literal.
138+
139+
The value of the string literal will be determined using the following
140+
steps:
141+
142+
1. Start from the first newline after the opening backticks.
143+
2. Take the string exactly as written until the closing backticks.
144+
3. Remove equal numbers of spaces or tabs from every non-empty line
145+
until the first character of the first non-empty line is neither
146+
a space nor a tab, or until every line is empty.
147+
Raise a compile error if this could not be done
148+
due to a "negative" indent or inconsistent whitespace (eg. if
149+
some lines are indented using tabs and some using spaces).
150+
151+
Here are some edge case examples:
152+
153+
```rust
154+
// Empty string
155+
assert_eq!(```foo
156+
```, "");
157+
158+
// Newline
159+
assert_eq!(```
160+
161+
```, "\n");
162+
163+
// No terminating newline
164+
assert_eq!(```
165+
bar```, "bar");
166+
167+
// Terminating newline
168+
assert_eq!(```
169+
bar
170+
```, "bar\n");
171+
172+
// Preserved indent
173+
assert_eq!(```
174+
if a:
175+
print(42)
176+
```, "if a:\n print(42)\n");
177+
178+
// Relative indent
179+
assert_eq!(```
180+
if a:
181+
print(42)
182+
```, "if a:\n print(42)\n");
183+
184+
// Relative to first non-empty line
185+
assert_eq!(```
186+
187+
188+
if a:
189+
print(42)
190+
```, "\n\nif a:\n print(42)\n");
191+
```
192+
193+
The text between the opening backticks and the first newline is
194+
preserved within the AST, but is otherwise unused.
195+
196+
# Drawbacks
197+
[drawbacks]: #drawbacks
198+
199+
The main drawback is increased complexity of the language:
200+
201+
1. It adds a new symbol to the language, which was not previously used.
202+
2. It adds a third way of writing string literals.
203+
204+
# Rationale and alternatives
205+
[rationale-and-alternatives]: #rationale-and-alternatives
206+
207+
There is lots of room to bike-shed syntax.
208+
If there is significant opposition to the backtick syntax, then an
209+
alternative syntax such as:
210+
```
211+
code"
212+
string
213+
"
214+
```
215+
could be used.
216+
217+
Similarly, the use of more than three backticks may be unpopular.
218+
It's not clear how important it is to be able to nest backticks
219+
within backticks, but a syntax mirroring raw string literals could
220+
be used instead, eg.
221+
```
222+
`# foo
223+
string
224+
#`
225+
```
226+
227+
There is also the question of whether the backtick syntax would
228+
interfere with the ability to paste Rust code snippets into such
229+
blocks. Experimentally, markdown parsers do not seem to have any
230+
problems with this (as demonstrated in this document).
231+
232+
# Prior art
233+
[prior-art]: #prior-art
234+
235+
The proposed syntax is primarily based on markdown code block syntax,
236+
which is widely used and should be familiar to most programmers.
237+
238+
239+
# Unresolved questions
240+
[unresolved-questions]: #unresolved-questions
241+
242+
- None
243+
244+
# Future possibilities
245+
[future-possibilities]: #future-possibilities
246+
247+
- Macro authors could perform further processing
248+
on code string literals. These macros could add support for string
249+
interpolation, escaping, etc. without needing to further complicate
250+
the language itself.
251+
252+
- Procedural macros could look at the text following the opening triple
253+
quotes and use that to influence code generation, eg.
254+
255+
```rust
256+
query!(```postgresql
257+
<query>
258+
```)
259+
```
260+
261+
could parse the query in a PostgreSQL specific way.
262+
263+
- Code literals could be used by crates like `html-macro`
264+
or `quote` to provide better surface syntax and faster
265+
compilation.
266+
267+
- Code literals could be used with the `asm!` macro to avoid
268+
needing a new string on every line.

0 commit comments

Comments
 (0)