From ae8a3a212b7e98ca719cbcfcc7f61bb3f981d592 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Thu, 5 Jun 2025 04:43:28 +0100 Subject: [PATCH 01/43] RFC: Dedented String Literals --- text/3830-dedented-string-literals.md | 1296 +++++++++++++++++++++++++ 1 file changed, 1296 insertions(+) create mode 100644 text/3830-dedented-string-literals.md diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md new file mode 100644 index 00000000000..8a1c12b1e96 --- /dev/null +++ b/text/3830-dedented-string-literals.md @@ -0,0 +1,1296 @@ +- Feature Name: `dedented_string_literals` +- Start Date: 2025-06-05 +- RFC PR: [rust-lang/rfcs#3830](https://github.com/rust-lang/rfcs/pull/3830) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Add dedented string literals: `d"string"`. + +With: + +```rs +let sql = d" + create table student( + id int primary key, + name text + ) + "; +``` + +Being equivalent to: + +```rs +let sql = "\ +create table student( + id int primary key, + name text +)"; +``` + +# Motivation +[motivation]: #motivation + +Problem: Embedding formatted text in Rust's string literals forces us to make a choice: + +- Sacrifice readability of the output +- Sacrifice readability of the source code + +## Introduction + +### Sacrifice readability of the output + +In order to print the following: + +```sql +create table student( + id int primary key, + name text +) +``` + +The initial attempt might look as follows: + +```rust +fn main() { + println!(" + create table student( + id int primary key, + name text + ) + "); +} +``` + +Which outputs (using `^` to mark the beginning of a line, and `·` to mark a leading space): + +```sql +^ +^········create table student( +^············id int primary key, +^············name text +^········) +^···· +^ +``` + +The output is formatted in an unconventional way, containing excessive leading whitespace. + +The alternative allows for a sane output, but at the cost of making the code less readable: + +### Sacrifice readability of the source code + +In order for the output to be more sensible, we must sacrifice readability of the source code: + +```rust +fn main() { + println!( + "\ +create table student( + id int primary key, + name text +)"); +} +``` + +The above example would output the expected: + +```sql +create table student( + id int primary key, + name text +) +``` + +But the improvement in output comes at a cost: + +1. We now have to escape the first newline: + + ```diff + fn main() { + println!( + + "\ + create table student( + id int primary key, + name text + )"); + } + ``` + + This is not possible to do in raw strings, so the output ends up looking even worse for them, with indentation of the outer SQL statement being larger in the source code than the inner statement: + + ```diff + fn main() { + println!( + + r#"create table student( + id int primary key, + name text + )"#); + ``` + +2. The SQL statement does not have any indentation in reference to the surrounding code. + + This is contrary to how we would normally write code, with indentation 1 level deeper than the surrounding. + + ```diff + fn main() { + println!( + "\ + +create table student( + + id int primary key, + + name text + +)"); + } + ``` + + This makes it confusing to tell which scope the string belongs to. This is especially true when there are multile scopes involved: + + ```rs + fn main() { + { + println!( + "\ + create table student( + id int primary key, + name text + )"); + } + println!( + "\ + create table student( + id int primary key, + name text + )"); + { + { + println!( + "\ + create table student( + id int primary key, + name text + )"); + + } + } + } + ``` + + All of the strings end up on the same level, despite them being in different scopes. + +3. The closing double-quote must be put at the beginning of the line, in order not to introduce trailing whitespace: + + ```diff + fn main() { + println!( + "\ + create table student( + id int primary key, + name text + +)"); + } + ``` + +As you can see, we have to choose one or the other. In either case we have to give something up. + +Sometimes, we are *forced* into the first option - sacrifice readability of the source. + +In some cases, producing excessive whitespace will change meaning of the output. + +Consider whitespace-sensitive languages such as Python or Haskell, or content which is meant to be read by people like generated Markdown - here we *can't* make a sacrifice on readabilty of the output - so our source code must become harder to understand. + +But, what if we could have the best of both worlds? + +### Dedented string literals + +In order to solve these problems, the RFC proposes dedented string literals of the form: `d"string"`. + +Common leading whitespace on each line after the closing quote in dedented string literals will be stripped at compile-time. + +This allows us to have a more readable version of the above: + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace (will be removed) +} +``` + +All of the above problems are gracefully solved: + +1. Indentation level inside the string is the same as what is in the output. +1. It does not require escaping the first newline for it to look readable. +1. Nicely composes with raw string literal: `dr#"string"#`, in which the first newline *cannot* be escaped. +1. Indentation level of the statement is larger than the `println!` call, + making it more obvious that the string is inside the call at a glance. +1. The closing parentheses in the SQL statement aligs with `create table` + and is 1 level larger than `println!`. + +Now, consider the example with multiple nested scopes again: + +```rs +fn main() { + { + println!(d" + create table student( + id int primary key, + name text + ) + "); + } + println!(d" + create table student( + id int primary key, + name text + ) + "); + { + { + println!(d" + create table student( + id int primary key, + name text + ) + "); + } + } +} +``` + +It is immediately more obvious which string belongs to which scope. + +## Closing quote controls the removed indentation + +All of the common whitespace between each line, which has a higher indentation than the indentation of the line of closing quote (contained in the last line) is stripped. + +Here are a few examples to demonstrate. + +### No indentation is stripped when the closing quote has no indentation + +The output is the same as what is in the source code. + +This allows all lines to have a common indentation. + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) +"); +// no common leading whitespace = nothing to remove +} +``` + +In the above example, the closing quote is on the very first character. Common indentation is not stripped at all. + +Prints: + +```sql + create table student( + id int primary key, + name text + ) +``` + +Outcome: **No indentation is removed. Output contains 2 levels of indentation. Source contains 2 levels of indentation**. + +### Strip 1 level of indentation + +In order to strip the first level of indentation, the ending quote is aligned to the `println!` call. + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^ // common leading whitespace (will be removed) +} +``` + +The indentation of the closing double quote is 4 spaces. The 4 spaces will be removed from each line. + +Prints: + +```sql + create table student( + id int primary key, + name text + ) +``` + +Outcome: **1 indentation level in the output, 1 indentation level has been stripped from the source**. + +### Strip *all* indentation + +All indentation can be stripped by placing the closing double quote on the same level as content of the dedented string literal: + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace (will be removed) +} +``` + +The indentation of the ending double quote is 8 spaces. This common prefix of leading whitespace characters will be removed from the beginning of each line. + +Prints: + +```sql +create table student( + id int primary key, + name text +) +``` + +Result: **all indentation from source is stripped**. + +Indenting the closing double quote further will have zero impact. +The dedentation will never remove non-whitespace characters. + +Each of the following **examples** print: + +```sql +create table student( + id int primary key, + name text +) +``` + +**Examples**: + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^^^^^ // closing quote indentation: 12 spaces +} + +// spaces removed from the beginning of each line = min(8, 12) = 8 +``` + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^^^^^^^^^ // closing quote indentation: 16 spaces +} +// spaces removed from the beginning of each line = min(8, 16) = 8 +``` + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^^^^^^^^^^^^^ // closing quote indentation: 20 spaces +} +// spaces removed from the beginning of each line = min(8, 20) = 8 +``` + +## Composition with other string literal modifiers, such as raw string literals and byte string literals + +Dedented string literals `d"string"` are a new modifier for strings. + +They are similar to byte strings `b"string"` and raw strings `r#"string"#`. + +They compose with other every other string literal modifier. + +To be precise, the RFC introduces 6 new types of string literals: + +- Dedented string literal: `d"string"` +- Dedented raw string literal: `dr#"string"` +- Dedented byte string literal: `db#"string"` +- Dedented byte raw string literal: `dbr#"string"#` +- Dedented C string literal: `dc"string"` +- Dedented C raw string literal: `dcr#"string"#` + +The `format_args!` macro, and by extension all wrapper macros that pass arguments to `format_args!` under the hood - also accept dedented string literals: + +```rs +fn main() { + let table_name = "student"; + + println!(d" + create table {table_name}( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace (will be removed) +} +``` + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Any kind of string literal can turn into a "dedented" string literal if it is prefixed with a `d`: + +- strings: `"string"` -> `d"string"` +- Raw strings: `r#"string"` -> `dr#"string"` +- Byte strings: `b#"string"` -> `db#"string"` +- ...and others... + +> [!NOTE] +> +> The above list is a slight simplification. +> There are a few rules that apply to dedented string literals which we will get to shortly. + +An example comparing regular `"string"`s and dedented `d"string"`s: + +```rust +let regular = " + I am a regular string literal. + "; + +// All of the visible whitespace is kept. +assert_eq!(regular, "\n I am a regular string literal.\n "); + +// ↓ newline is removed +let dedented = d" + I am a dedented string literal! + "; //^ newline is removed +//^^ whitespace is removed + +assert_eq!(dedented, "I am a dedented string literal!"); +``` + +Common indentation of all lines up to, but **not including** the closing quote `"` is removed from the beginning of each line. + +Indentation present *after* the double-quote is kept: + +```rs +// ↓ newline is removed +let dedented = d" + I am a dedented string literal! + "; //^ newline is removed +//^^ whitespace is removed +// ^^^^ indentation after the double quote is kept + +assert_eq!(dedented, " I am a dedented string literal!"); +``` + +Dedented string literals make it easy to embed multi-line strings that you would like to keep formatted according to the rest of the code: + +```rs +let py = d" + def hello(): + print('Hello, world!') + + hello() + "; +//^^ removed + +let expected = "def hello():\n print('Hello, world!')\n\nhello()"; +assert_eq!(py, expected); +``` + +They compose with all string literals, such as c strings `c"string"`, raw strings, `r#"string"#` and byte strings `b"string"`: + +```rs +// dedented raw string +let py = dr#" + def hello(): + print("Hello, world!") + + hello() + "#; +//^^ removed + +let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; +assert_eq!(py, expected); +``` + +You can use them in formatting macros, such as `println!`, `write!`, `assert_eq!`, `format_args!` and similar: + +```rs +let message = "Hello, world!"; + +let py = format!(dr#" + def hello(): + print("{message}") + + hello() + "#); +//^^ removed + +let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; +assert_eq!(py, expected); +``` + +By placing the ending quote earlier than the first non-whitespace character in any of the lines, you can reduce how much space is removed from the beginning of each line: + +```rs +use std::io::Write as _; + +let message = "Hello, world!"; +let mut py = String::new(); + +// Note: Using `writeln!` because the final newline from dedented strings is removed. (more info later) + +writeln!(py, d" + def hello(): + "); +//^^ removed + +// Note: We want to add 2 newlines here. +// - `writeln!` adds 1 newline at the end +// - An additional empty line is added +// to insert the 2nd newline + +// Remember, dedented string literals strip the last newline. +writeln!(py, dr#" + print("{message}") + +"#); +//^^ kept + +write!(py, d" +hello() + "); +//^^^^^^^^^^ No whitespace is removed here. +// If the closing quote is after the common indentation +// (in this case there is no common indentation at all), +// all of the whitespace is stripped + +let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; +assert_eq!(py, expected); +``` + +## Rules + +### Dedented string literals must begin with a newline + +All dedented string literals must begin with a newline. +This newline is removed. + +The following is invalid: + +```rust +// ↓ error: expected literal newline. +// note: dedented string literals must start with a literal newline +// help: insert a literal newline here: +let py = d"def hello(): + print('Hello, world!') + + hello() + "; +``` + +Escape-code newline is not supported, it must be a literal newline: + +```rust +// ↓ error: expected literal newline, but found escaped newline. +// note: dedented string literals must start with a literal newline +let py = d"\ndef hello(): + print('Hello, world!') + + hello() + "; +``` + +This is the correct syntax for the first line: + +```rust +// OK +let py = d" + def hello(): + print('Hello, world!') + + hello() + "; +``` + +### Last line must be empty, and preceded by a literal newline + +The line which contains the closing quote `"` must be empty, and the character before the last line must be a literal newline character. + +This is invalid: + +```rust +let py = d" + def hello(): + print('Hello, world!') + + hello()"; +// ^ error: expected literal newline +// note: in dedented string literals, the line +// which contains the closing quote must be empty +``` + +Neither is using an escaped newline `\n` instead of the literal newline: + +```rust +let py = d" + def hello(): + print('Hello, world!') + + hello()\n"; +// ^ error: expected literal newline, but found escaped newline +// note: in dedented string literals, the line +// which contains the closing quote must be empty +``` + +This is the correct syntax for the last line: + +```rust +let py = d" + def hello(): + print('Hello, world!') + + hello() + "; +// OK +``` + +Benefits the above rules bring include: + +- The above rules make all dedented string literals you'll find in Rust consistent. +- It allows easily changing the indentation level without having to insert a newline sometimes. +- It gives the ability for us to tell a regular string literal from a dedented string literal at a glance. + +### No confusing whitespace escapes + +In dedented string literals, using the escapes `\r`, `\n` or `\t` is disallowed. + +This helps, making it obvious what will be stripped from the string content. + +Consider the following invalid dedented string: + +```rust +let py = d" + def hello():\n \tprint('Hello, world!')\r\n + hello() + "; +// error: ^^ newline escapes are not allowed in dedented strings +// error: ^^^^ newline escapes are not +// allowed in dedented strings +// error: ^^ tab escapes are not allowed in dedented strings +``` + +If that was allowed, it would not be immediately obvious where the whitespace should be stripped. + +In fact, it would be quite tricky to figure out. Therefore using these escape characters is disallowed. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## String Literals + +6 new [string literal](https://doc.rust-lang.org/reference/tokens.html#characters-and-strings) types: + +Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with `\ln` for the purpose of the explanation. + +| | Example | `#` sets[^nsets] | Characters | Escapes | +|----------------------------------------------|-----------------|------------|-------------|---------------------| +| Dedented String | `d"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) * | +| Dedented Raw string | `dr#"\ln EXAMPLE \ln"#` | <256 | All Unicode | `N/A` | +| Dedented Byte string | `db"\ln EXAMPLE \ln"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) * | +| Dedented Raw byte string | `dbr#"\ln EXAMPLE \ln"#` | <256 | All ASCII | `N/A` * | +| Dedented C string | `dc"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) * | +| Dedented Raw C string | `dcr#"\ln EXAMPLE \ln"#` | <256 | All Unicode | `N/A` * | + +* +- `\n`, `\r` and `\t` literal escapes are never allowed in dedented strings. + +## Interaction with macros + +- `format_args!` and wrapper macros such as `println!` can accept dedented string literals: `format!(d"...")`. +- `concat!` accepts dedented strings, just like it accepts raw strings. Each dedented string passed to `concat!` is dedented before concatenation. +- The `literal` macro fragment specifier accepts all of the 6 new string literals. + +## Algorithm for dedented strings + +1. The opening line (the line containing the opening quote `"`) + - Must only contain a literal newline character after the `"` token + - This newline is removed. +1. The closing line (the line containing the closing quote `"`) + - Must contain only whitespace before the closing quote + - This whitespace is the *closing indentation*. + - The closing indentation is removed. +1. The character immediately before the closing line must be a literal newline character. + - This newline is removed. +1. The *common indentation* is calculated. + + It is the largest amount of leading whitespace shared by all non-empty lines. + +1. For each non-empty line, remove the smallest amount of leading whitespace that satisfies: + + - `min(common indentation, closing indentation)` + + What this means is: + - Even if a line is indented by more than the closing indentation + - Only the amount equal to the closing indentation, or less, will be removed. + - Never more than the line actually has. + +### Edge Cases + +> [!NOTE] +> +> `•` denotes a space. + +````rs +// the whitespace at the start of non-empty lines is not part +// of the calculation for "common indentation" +// amongst non-empty lines +// +// remove the smallest amount of leading whitespace +assert_eq!( + d" +••••hello +•• +••••world + ", +^^^^ // common leading whitespace (will be removed) + + "hello\nworld" +); + +// line consisting of only spaces is allowed + +// However, nothing is removed because the: + +// > common indentation of all non-empty lines + +// is 0 here. (all lines are empty) + +// so min(0, x) = 0 -> remove 0 characters +assert_eq!( + d" +•••••••• + ", + + "••••••••" +); + +// no whitespace removed either +assert_eq!( + d" +•••••••• +", + + "••••••••" +); + +// explanation: +// +// Initially we have: +// +// ```rust +// let _ = d" +// +// "; +// ``` +// +// The literal newline directly after the opening `"` is removed. We get: +// +// ```rust +// let _ = " +// "; +// ``` +// +// The literal newline directly before the line containing +// the closing `"` is removed. We get: +// +// ```rust +// let _ = ""; +// ``` +// +// An empty string. +assert_eq!( + d" + +", + + "" +); + +// error: Expected a literal newline character +// before the line containing the closing quote +// +// note: The literal newline character after the opening quote +// is removed in all cases +#[expect_compile_error] +let _ = d" + ", +```` + +# Drawbacks +[drawbacks]: #drawbacks + +- Contributes to the increase of string literal modifiers by adding a new variant. + + While at the moment the variety of string literal modifiers is small, it is worth to think about the implications of exponential increase of them. + + Currently, Rust has 7 types of string literals. This RFC will increase that to 13, because each string literal can be prefixed with a `d` to make it dedented. + + In the future Rust might get additional types of "string modifiers", and each combination will need to + be accounted for. + +- Increases complexity of the language. While it builds upon existing concepts, it is yet another thing for people to learn. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +## Design + +### The choice of `d"string"` specifically + +The syntax of `d"string"` is chosen for the following reasons: + +- Fits with existing string modifiers, such as `b"string"`, `r#"string"#"` and `c"string"` +- Composes with existing string modifiers: `db"string"`, `dc"string"`, `dr#"string"#`, and `dbr#"string"#`. +- Does not introduce a lot of new syntax. Dedented string literals can be explained in terms of existing language features. +- The acronym `d` for `dedent` is both clear, and not taken by any of the other string modifiers. +- Adding a single letter `d` before a string literal to turn it into a dedented string literal is an incredibly easy modification. +- Rust reserves space for additional string modifiers. + + Adding this feature does not require a new edition, as it is backwards-compatible for editions later than Edition 2024, as the syntax has been [reserved](https://doc.rust-lang.org/edition-guide/rust-2024/reserved-syntax.html) since this edition. + +The choice for `d` to come before all other modifiers is not arbitrary. + +Consider `dbr` and all possible alternatives: + +1. `dbr`: dedented byte raw string +1. `bdr`: byte dedented raw string +1. `brd`: byte raw dedented string + +The first example reads in the most natural manner. The other two don't. + + + +### Requirement of first and final newline + +As mentioned earlier in the RFC: + +- There must be a literal newline present directly after the opening quote `"`. +- There must be a literal newline present directly before the line containing the closing quote `"`. + +Having this as a hard requirement will make usages of dedented string literals more consistent. + +Consider the following which is invalid: + +```rs +fn main() { + // ERROR + println!(d"create table student( + id int primary key, + name text + ) + "); +} +``` + +- The `d"` and `create` in the first `d"create` not being separated by whitespace makes it harder to understand where the code begins. They have to be mentally separated. +- Additionally, indentation of the `create` does not align with what it will look like in the output, making it less obvious, which we would like to aviod. Therefore it is a **hard error** to not have a literal newline there. + +The following is also incorrect, as there is no newline before the line containing the closing quote: + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + )"); // ERROR +} +``` + +- Having the closing quote **always** be on its own line makes it more obvious to the reader from which column onwards leading indentation will be removed. +- In the example above, it is not immediately clear where that would be from. +- It easy to modify the common indentation level of the string in the future, as you do not have to create a new line. + +## Differences from RFC 3450 + +The [RFC #3450: Propose code string literals](https://github.com/rust-lang/rfcs/pull/3450) is similar to this one, however this RFC is different and this section explains why. + +Differences: + +- #3450 uses `h` as the modifier instead of `d`. + + proposes using `h` as acronym for [Here document](https://en.wikipedia.org/wiki/Here_document). + + The term is likely to be less known, and may raise confusion. + + Additionally, here documents are more associated with "code blocks". While this feature is useful for code blocks, it is not just for them. + + While the `d` mnemonic for **dedent** clearly describes what actually happens to the strings. + +- #3450 allows to write an *info string*, like in markdown. + + It proposes the ability to write: + + ```rs + let sql = d"sql + SELECT * FROM table; + "; + ``` + + With the `sql` not affecting the output, but can aid in syntax highlighting and such. + + 1. This is not necessary, as at the moment you can add a block comment next to the string, which syntax highlighters can use *today* to inject whatever language is specified. + + ```rs + let sql = /* sql */ "SELECT * FROM table;"; + ``` + + 2. Is considered out of scope for this RFC to consider. + + It would be a backward-compatible change to make for a future RFC, if it's desired. + + 3. [Expression attributes](https://github.com/rust-lang/rust/issues/15701) are likely to be more suitable for this purpose. (not part of this RFC) + + ```rs + let sql = #[editor::language("sql")] "SELECT * FROM table;"; + ``` + +- RFC #3450 makes the "code strings" always end with a newline, with the ability to prepend a minus before the closing quote in order to remove the final newline. + + However, in this RFC the following: + + ```rs + print!(d" + a + "); + ^^^^ // common leading whitespace (will be removed) + ``` + + Prints: `a` + + **Without** an ending newline. + + In order to add a newline at the end, you have to add a newline in the source code: + + ```rs + print!(d" + a + + "); + ^^^^ // common leading whitespace (will be removed) + ``` + + The above prints: + + ``` + a + ``` + + **With** a newline. + + Additionally, finishing with `-"` instead of `"` is not seen anywhere in the language, and would not fit in. + +## Use a macro instead + +What are the benefits over using a macro? + +The [`indoc`](https://crates.io/crates/indoc) crate is similar to the feature this RFC proposes. + +The macros the crate exports help create dedented strings: + +- `eprintdoc!` +- `formatdoc!` +- `indoc!` +- `printdoc!` +- `writedoc!` + +These macros would no longer be necessary, as the dedented string literals compose with the underlying macro call. (Dedented strings can be passed to `format_args!`). + +The benefits of replacing these, and similar macros with language features are described below. + +### Reduces the proliferation of macros + +Macros can make code harder to understand. They can transform the inputs in arbitrary ways. Contributors have to learn them, increasing the entry barrier for a new project. + +For the above reason, projects may be hesitant to use crates that provide this as it would make contributing harder. + +The dedent macros will be possible to replace using the dedented string literals proposed in this RFC. Examples, using the `indoc` crate's macros specifically: + +- `eprintdoc!`: Calls `eprint!` under the hood, dedenting the passed string. + + Before: + + ```rs + eprintdoc! {" + GET {url} + Accept: {mime} + ", + ^^^^ // common leading whitespace (will be removed) + url = "http://localhost:8080", + mime = "application/json", + } + ``` + + With dedented string literals: + + ```rs + eprintln! { + d" + GET {url} + Accept: {mime} + ", + ^^^^ // common leading whitespace (will be removed) + url = "http://localhost:8080", + mime = "application/json", + } + ``` + + Both snippets print: + + ``` + GET http://localhost:8080 + Accept: application/json + ``` + + Note that `eprintdoc!` does not remove the final line, that's why we use `eprintln` instead of `eprint`. + +- `indoc!`: Dedents the passed string. + + Before: + + ```rs + indoc! {r#" + def hello(): + print("Hello, world!") + + hello() + "#} + ^^^^ // common leading whitespace (will be removed) + ``` + + With dedented string literals: + + ```rs + dr#" + def hello(): + print("Hello, world!") + + hello() + + "# + ^^^^ // common leading whitespace (will be removed) + ``` + + Both snippets evaluate to: + + ```py + def hello(): + print("Hello, world!") + + hello() + ``` + + Note that `indoc!` does not remove the final line, that's why we add an additional newline after `hello()`. + +As a bonus, not only does it unify many macros under a single language feature. + +It also allows us to trivially create new macros that automatically make use of the feature in a backwards-compatible way. + +Take for instance the `text!` macro exported from the `iced` crate: + +```rs +macro_rules! text { + ($($arg:tt)*) => { + $crate::Text::new(format!($($arg)*)) + }; +} +``` + +In order to ergonomically supply a dedented string to it, one needs to re-create the macro: + +```rs +macro_rules! textdoc { + ($($arg:tt)*) => { + iced::text!(formatdoc!($($arg)*)) + }; +} +``` + +That's not a problem for *this* example, however with more involved macros such as [ones from the `log` crate](https://docs.rs/log/0.4.27/src/log/macros.rs.html#165-186) it becomes a problem. + +With this RFC, re-implementing the macros is not going to be necessary anymore, as you can just pass in the dedented string literals: + +```rs +text!(d" + GET {url} + Accept: {mime} +") +^^^^ // common leading whitespace (will be removed) +``` + +The language feature works with any user-defined macros that pass their arguments to `format_args!` under the hood. + +### Improved compile times + +Having dedented strings as a language feature could reduce compile time. + +- Users do not have to compile the crate *or* its dependencies. +- There is no need for procedural macro expansion to take place in order to un-indent the macro. This step happens directly in the compiler. + +## Use a crate instead + +What are the benefits over using a crate, such as `indoc`? + +1. Having dedented strings as a language feature allows them to be used in Rust snippets + and examples where said examples would not otherwise have a dependency on the crate. + + This makes the feature more discoverable. + +2. Dedented strings are a "nice-to-have", if they were a core language feature they would likely be used + much more, but as this functionality is currently only available in a crate, it is unlikely people + would want to add a dependency just for dedented strings, especially for one-off usecases. + +3. No need to know about the specific crate, which most projects may not depend on. + + Learn the feature once, and use it anywhere. + +4. Reduce the entry barrier to contribution to projects + + Crates may be hesitant in adding a dependency on a dedented string crate because it would + be *yet another* thing for contributors to learn and be aware of. + +## Impact of *not* implementing this RFC + +- The Rust ecosystem will continue to rely on third-party crates like `indoc` that provide dedented string literals which only work with the macros provided by the crate. + + Composing them with macros from a different crate may not always be ergonomic. +- Examples and snippets of Rust code that would otherwise not depend on any dependency will not benefit from dedented string literals. +- Crates that would otherwise benefit from the feature, but do not consider it worth enough to add an additional dependency for, will not benefit from dedented string literals. + +# Prior art +[prior-art]: #prior-art + +In other languages: + +- _Java_ - [text blocks](https://openjdk.java.net/jeps/378) using triple-quotes. +- _Kotlin_ - [raw strings](https://kotlinlang.org/docs/strings.html#raw-strings) using triple-quotes and `.trimIndent()`. +- _Scala_ - [multiline strings](https://docs.scala-lang.org/overviews/scala-book/two-notes-about-strings.html) + using triple-quotes and `.stripMargin`. +- _C#_ - [Raw string literals](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/raw-string) +- _Python_ - [multiline strings](https://docs.python.org/3/library/textwrap.html) using triple-quotes + to avoid escaping and `textwrap.dedent`. +- _Jsonnet_ - [text blocks](https://jsonnet.org/learning/tutorial.html) with `|||` as a delimiter. +- _Bash_ - [`<<-` Heredocs](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04). +- _Ruby_ - [`<<~` Heredocs](https://www.rubyguides.com/2018/11/ruby-heredoc/). +- _Swift_ - [multiline string literals](https://docs.swift.org/swift-book/LanguageGuide/StringsAndCharacters.html#ID286) + using triple-quotes - strips margin based on whitespace before closing + delimiter. +- _Nix_ - [indented strings](https://nix.dev/manual/nix/2.29/language/string-literals.html) +- _Scala_ - [stripMargin](https://www.scala-lang.org/api/2.12.7/scala/collection/immutable/StringLike.html#stripMargin:String) +- _PHP_ - `<<<` [heredoc/nowdoc](https://wiki.php.net/rfc/flexible_heredoc_nowdoc_syntaxes#closing_marker_indentation) + The indentation of the closing marker dictates the amount of whitespace to + strip from each line. +- _JavaScript_ - [Proposal String Dedent](https://github.com/tc39/proposal-string-dedent) +- _MoonBit_ - [Multi-line Strings](https://docs.moonbitlang.com/en/latest/language/fundamentals.html#string) +- _Haskell_ - [Multi-line Strings](https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/multiline_strings.html) + +In the Rust ecosystem: + +- [`dedent`](https://docs.rs/dedent/0.1.1/dedent/macro.dedent.html) +- [`textwrap-macros`](https://docs.rs/textwrap-macros/0.3.0/textwrap_macros/macro.dedent.html) +- [`indoc`](https://docs.rs/indoc/latest/indoc/) + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +What should happen if we have tabs (represented by `→`) and literal spaces (represented by `•`) mixed together? + +```rust +let py = d" +→→→→def hello(): +→→→→••••print('Hello, world!') + +•→••hello() +→→••"; +``` + +# Future possibilities +[future-possibilities]: #future-possibilities + +## Relax rules around whitespace characters + +Currently for the purposes of this RFC, all dedented strings disallow using whitespace escaped characters: `\t`, `\r` and `\n`. + +This restriction could be lifted in specific situations in the future by a different RFC. In any version. Without requiring an edition. + +In theory it could be possible to employ some more advanced heuristics in order to allow characters like `\t` in some places, such as in a line after non-empty characters. + +The above idea is not part of this RFC, just a mere speculation what could be done in the future. + +## More string modifiers + +At some point, Rust might gain new types of string modifiers. Such as `o"string"` which would create a `String`, for example. (only speculative) + +Supporting these new hypothetical string modifiers means that the interaction between all possible string modifiers needs to be taken into account. + +Each new string modifier could *double* the variety of string literals, possibly leading to combinatorial explosion. + +## `rustfmt` support + +Formatting tooling such as `rustfmt` will be able to make modifications to the source it previously would not have been able to modify, due to the modifications changing output of the program. + +If indentation of the dedented string does not match the surrounding code: + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^ // common leading whitespace (will be removed) +} +``` + +It could be automatically formatted by adding additional leading indentation, in order to align it with the surrounding source code: + +```rust +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +^^^^^^^^ // common leading whitespace (will be removed) +} +``` + +This would never modify the output, but make the source code more pleasant - and bring more automation and consistency to the Rust ecosystem. + +With regular string literals, this isn't possible - as modifying the whitespace in the string changes the output. + +## `clippy` lint + +There could be a lint which detects strings which could be written clearer as dedented string literals. From b98bd7ff7bd6e05f9af46dc17cc3e6927cb194e4 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+nik-rev@users.noreply.github.com> Date: Thu, 5 Jun 2025 19:54:02 +0100 Subject: [PATCH 02/43] fix: remove `#` Co-authored-by: Jacob Lifshay --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 8a1c12b1e96..aaa5bd62693 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -428,7 +428,7 @@ To be precise, the RFC introduces 6 new types of string literals: - Dedented string literal: `d"string"` - Dedented raw string literal: `dr#"string"` -- Dedented byte string literal: `db#"string"` +- Dedented byte string literal: `db"string"` - Dedented byte raw string literal: `dbr#"string"#` - Dedented C string literal: `dc"string"` - Dedented C raw string literal: `dcr#"string"#` From 5ced05a3f10eb4956c2edf9c362d33ba06b6d339 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Thu, 5 Jun 2025 19:55:10 +0100 Subject: [PATCH 03/43] fix: add `#` --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index aaa5bd62693..1f263230022 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -427,7 +427,7 @@ They compose with other every other string literal modifier. To be precise, the RFC introduces 6 new types of string literals: - Dedented string literal: `d"string"` -- Dedented raw string literal: `dr#"string"` +- Dedented raw string literal: `dr#"string"#` - Dedented byte string literal: `db"string"` - Dedented byte raw string literal: `dbr#"string"#` - Dedented C string literal: `dc"string"` From b4590acb5a09aa2889586e5d12a96c64f85e5945 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 12:10:28 +0100 Subject: [PATCH 04/43] Clarify why indenting closing quote further is not a syntax error --- text/3830-dedented-string-literals.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 1f263230022..0d322e98563 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -931,6 +931,24 @@ fn main() { - In the example above, it is not immediately clear where that would be from. - It easy to modify the common indentation level of the string in the future, as you do not have to create a new line. +### Allowing the closing line to be indented more than previous lines + +Having the quote be indented further than the first non-whitespace character in the +string content is allowed: + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +} +``` + +The reasoning is that turning this into a syntax error is too strict, when it can be auto-fixed by tooling like `rustfmt`. + ## Differences from RFC 3450 The [RFC #3450: Propose code string literals](https://github.com/rust-lang/rfcs/pull/3450) is similar to this one, however this RFC is different and this section explains why. From ac324bd9e37880f229b4cf97e68e74a7adb2009d Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 12:26:01 +0100 Subject: [PATCH 05/43] Relax rules around escaped whitespace characters --- text/3830-dedented-string-literals.md | 86 ++++++++++++++------------- 1 file changed, 44 insertions(+), 42 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 0d322e98563..dfd5aeb2300 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -677,29 +677,6 @@ Benefits the above rules bring include: - It allows easily changing the indentation level without having to insert a newline sometimes. - It gives the ability for us to tell a regular string literal from a dedented string literal at a glance. -### No confusing whitespace escapes - -In dedented string literals, using the escapes `\r`, `\n` or `\t` is disallowed. - -This helps, making it obvious what will be stripped from the string content. - -Consider the following invalid dedented string: - -```rust -let py = d" - def hello():\n \tprint('Hello, world!')\r\n - hello() - "; -// error: ^^ newline escapes are not allowed in dedented strings -// error: ^^^^ newline escapes are not -// allowed in dedented strings -// error: ^^ tab escapes are not allowed in dedented strings -``` - -If that was allowed, it would not be immediately obvious where the whitespace should be stripped. - -In fact, it would be quite tricky to figure out. Therefore using these escape characters is disallowed. - # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -711,15 +688,12 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` | | Example | `#` sets[^nsets] | Characters | Escapes | |----------------------------------------------|-----------------|------------|-------------|---------------------| -| Dedented String | `d"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) * | +| Dedented String | `d"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | | Dedented Raw string | `dr#"\ln EXAMPLE \ln"#` | <256 | All Unicode | `N/A` | -| Dedented Byte string | `db"\ln EXAMPLE \ln"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) * | -| Dedented Raw byte string | `dbr#"\ln EXAMPLE \ln"#` | <256 | All ASCII | `N/A` * | -| Dedented C string | `dc"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) * | -| Dedented Raw C string | `dcr#"\ln EXAMPLE \ln"#` | <256 | All Unicode | `N/A` * | - -* -- `\n`, `\r` and `\t` literal escapes are never allowed in dedented strings. +| Dedented Byte string | `db"\ln EXAMPLE \ln"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | +| Dedented Raw byte string | `dbr#"\ln EXAMPLE \ln"#` | <256 | All ASCII | `N/A` | +| Dedented C string | `dc"\ln EXAMPLE \ln"` | 0 | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) | +| Dedented Raw C string | `dcr#"\ln EXAMPLE \ln"#` | <256 | All Unicode | `N/A` | ## Interaction with macros @@ -729,6 +703,12 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings +> [!NOTE] +> +> Whitespace escape characters such as `\t`, `\r` and `\n` are treated as literal code when present in the content of the dedented string, therefore the normal dedentation rules apply to them. +> +> This does not apply to `\n` after the opening quote, nor the `\n` before the line containing the closing quote. In this case escaping the newline is not allowed, it has to be a literal newline. (As described previously.) + 1. The opening line (the line containing the opening quote `"`) - Must only contain a literal newline character after the `"` token - This newline is removed. @@ -774,6 +754,19 @@ assert_eq!( "hello\nworld" ); +// This example has the same whitespace as the previous example. +// However, here we make use of whitespace escape characters +// +// This might make code more confusing, so one of the future-possibilities +// is to have a warn-by-default lint to disallow these characters in dedented strings. +assert_eq!( + d" +\thello\n••\n\tworld + ", + + "hello\nworld" +); + // line consisting of only spaces is allowed // However, nothing is removed because the: @@ -1255,16 +1248,6 @@ let py = d" # Future possibilities [future-possibilities]: #future-possibilities -## Relax rules around whitespace characters - -Currently for the purposes of this RFC, all dedented strings disallow using whitespace escaped characters: `\t`, `\r` and `\n`. - -This restriction could be lifted in specific situations in the future by a different RFC. In any version. Without requiring an edition. - -In theory it could be possible to employ some more advanced heuristics in order to allow characters like `\t` in some places, such as in a line after non-empty characters. - -The above idea is not part of this RFC, just a mere speculation what could be done in the future. - ## More string modifiers At some point, Rust might gain new types of string modifiers. Such as `o"string"` which would create a `String`, for example. (only speculative) @@ -1309,6 +1292,25 @@ This would never modify the output, but make the source code more pleasant - and With regular string literals, this isn't possible - as modifying the whitespace in the string changes the output. -## `clippy` lint +## `clippy` lint to convert strings into dedented string literals There could be a lint which detects strings which could be written clearer as dedented string literals. + +## `rustc` warn-by-default lint to disallow whitespace escape characters + +In the following example: + +```rs +assert_eq!( + d" +\thello\n••\n\tworld + ", // note: This is a tab, not 4 spaces. +//^^ common leading whitespace (will be removed) + + "hello\nworld" +); +``` + +Using escaped whitespace characters is the same as if the characters were written literally. (in the *content* of the string. This excludes the requirement of a **literal** newline after the double quote and before the line of the closing quote). + +This is confusing, and might not work in the way people expect it to work. A warn-by-default lint could be added to disallow `\n`, `\t` and `\r` in dedented strings. (Or for instance, only allow `\t` anytime after the stripped indentation) From 9bae185740fd804353d27e40a806aac740ccd785 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 12:59:54 +0100 Subject: [PATCH 06/43] Move section on crate-provided macros under the "Use a crate instead" section --- text/3830-dedented-string-literals.md | 52 +++++++++++++-------------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index dfd5aeb2300..ed6e7df8291 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1021,9 +1021,29 @@ Differences: Additionally, finishing with `-"` instead of `"` is not seen anywhere in the language, and would not fit in. -## Use a macro instead +## Use a crate instead + +What are the benefits over using a crate, such as `indoc`? + +1. Having dedented strings as a language feature allows them to be used in Rust snippets + and examples where said examples would not otherwise have a dependency on the crate. + + This makes the feature more discoverable. + +2. Dedented strings are a "nice-to-have", if they were a core language feature they would likely be used + much more, but as this functionality is currently only available in a crate, it is unlikely people + would want to add a dependency just for dedented strings, especially for one-off usecases. + +3. No need to know about the specific crate, which most projects may not depend on. -What are the benefits over using a macro? + Learn the feature once, and use it anywhere. + +4. Reduce the entry barrier to contribution to projects + + Crates may be hesitant in adding a dependency on a dedented string crate because it would + be *yet another* thing for contributors to learn and be aware of. + +### Crate macros The [`indoc`](https://crates.io/crates/indoc) crate is similar to the feature this RFC proposes. @@ -1039,7 +1059,7 @@ These macros would no longer be necessary, as the dedented string literals compo The benefits of replacing these, and similar macros with language features are described below. -### Reduces the proliferation of macros +#### Reduces the proliferation of macros Macros can make code harder to understand. They can transform the inputs in arbitrary ways. Contributors have to learn them, increasing the entry barrier for a new project. @@ -1161,35 +1181,13 @@ text!(d" The language feature works with any user-defined macros that pass their arguments to `format_args!` under the hood. -### Improved compile times +#### Improved compile times -Having dedented strings as a language feature could reduce compile time. +Having dedented strings as a language feature, instead of relying on a macro provided by a crate could reduce compile time. - Users do not have to compile the crate *or* its dependencies. - There is no need for procedural macro expansion to take place in order to un-indent the macro. This step happens directly in the compiler. -## Use a crate instead - -What are the benefits over using a crate, such as `indoc`? - -1. Having dedented strings as a language feature allows them to be used in Rust snippets - and examples where said examples would not otherwise have a dependency on the crate. - - This makes the feature more discoverable. - -2. Dedented strings are a "nice-to-have", if they were a core language feature they would likely be used - much more, but as this functionality is currently only available in a crate, it is unlikely people - would want to add a dependency just for dedented strings, especially for one-off usecases. - -3. No need to know about the specific crate, which most projects may not depend on. - - Learn the feature once, and use it anywhere. - -4. Reduce the entry barrier to contribution to projects - - Crates may be hesitant in adding a dependency on a dedented string crate because it would - be *yet another* thing for contributors to learn and be aware of. - ## Impact of *not* implementing this RFC - The Rust ecosystem will continue to rely on third-party crates like `indoc` that provide dedented string literals which only work with the macros provided by the crate. From a589e0d2601501eac441590e4b460c72e9b18d38 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 16:04:12 +0100 Subject: [PATCH 07/43] Explain why a built-in macro would not suffice --- text/3830-dedented-string-literals.md | 245 ++++++++++++++++++++++++++ 1 file changed, 245 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index ed6e7df8291..e50d698a524 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1188,6 +1188,251 @@ Having dedented strings as a language feature, instead of relying on a macro pro - Users do not have to compile the crate *or* its dependencies. - There is no need for procedural macro expansion to take place in order to un-indent the macro. This step happens directly in the compiler. +## Use a built-in macro instead + +What about using a compiler built-in macro like `dedent!("string")` instead of a language-built in string modifier such as `d"string"`? + +### Advantages + +- Will likely have similar performance to the literal itself. + +### Disadvantages + +#### The macro will be unable to capture variables from the surrounding scope + +One of the major benefits of having dedented string literals is that you'll be able to use them in formatting macros: + +```rs +let message = "Hello, world!"; + +// `{message}` is interpolated +let py = format!(dr#" + def hello(): + print("{message}") + + hello() + "#); +//^^ removed + +let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; +assert_eq!(py, expected); +``` + +In the above example, the variable `message` is captured and used directly in the `format!` macro call. + +However, this feature would not be possible with a `dedent!` macro. + +Consider the following code: + +```rs +fn main() { + let foo = "foo"; + let bar = "bar"; + + let x = format!(concat!("{foo}", "bar")); +} +``` + +It attempts to create a string `{foo}bar` which is passed to `format!`. Due to limitations, it does not compile: + +``` +error: there is no argument named `foo` + --> src/main.rs:5:21 + | +5 | let x = format!(concat!("{foo}", "bar")); + | ^^^^^^^^^^^^^^^^^^^^^^^ + | + = note: did you intend to capture a variable `foo` from the surrounding scope? + = note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro + +error: could not compile `dedented` (bin "dedented") due to 1 previous error +``` + +Importantly: + +> to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro + +A `dedent!` macro would have the same limitation: Namely the string is created from the expansion of a macro. + +The problem with `dedent!` is that we expect it to be largely used with formatting macros such as `format!` and `println!` to make use of string interpolation. + +Implementing dedented string literals as a macro will significantly limit their functionality. + +Consider a conversion from a regular string literal that prints some HTML: + +```rust + writeln!(w, " \ + \ + \n\ + \n\ + \n
\ + \n

\ + \n {h1}\ + \n {nav}\ + \n

") +``` + +Into a dedented string literal: + +```rust + writeln!(w, dr#" + + + +
+

+ {h1} + {nav} +

+ "#); +``` + +The above conversion is elegant for these reasons: +- It is a simple modification by prepending `d` before the string literal +- All of the escaped sequences are removed, the whitespace removal is taken care of by the dedented string literal +- Since we can now use a raw string, we no longer have to escape the quotes +- Notably: **All of the interpolated variables continue to work as before**. + +With a dedented string *macro*, it's a much more involved process. The above will fail to compile because strings expanded from macros cannot capture variables like that. + +The problem being that we have to re-write all of the captured variables to pass them to the `writeln!` and not the dedented string itself: + +```rust + writeln!(w, + dedent!(r#" + + + +
+

+ {} + {} +

+ "#, + rel, + h1, + nav) + ); +``` + +Which is unfortunate. + +It might lead users to choose not to use this feature. + +#### Limits macros + +Macro fragment specifier `$lit: literal` is able to accept dedented string literals. + +However, it won't be able to accept string literals created from a macro. + +Today, the following code: + +```rs +macro_rules! foo { + ($lit:literal) => {{}}; +} + +fn main() { + foo!(concat!("foo", "bar")); +} +``` + +Fails to compile: + +```rs +error: no rules expected `concat` + --> src/main.rs:6:10 + | +1 | macro_rules! foo { + | ---------------- when calling this macro +... +6 | foo!(concat!("foo", "bar")); + | ^^^^^^ no rules expected this token in macro call + | +note: while trying to match meta-variable `$lit:literal` + --> src/main.rs:2:6 + | +2 | ($lit:literal) => {{}}; + | ^^^^^^^^^^^^ + +error: could not compile `dedented` (bin "dedented") due to 1 previous error +``` + +A `dedent!()` macro will have the same restriction. + +This limits yet again where the dedented strings count be used. + +#### Consistency + +It would be inconsistent to have dedicated syntax for raw string literals `r#"str"#`, but be forced to use a macro for dedented string literals. + +The modifiers `b"str"` and `r#"str"#` are placed in front of string literals. + +They do *no* allocation, only transforming the string at compile-time. + +We do not use macros like `byte!("str")` or `raw!("str")` to use them, so having to use `dedent!("str")` would feel inconsistent. + +Dedentation also happens at compile-time, transforming the string literal similar to how raw string literals `r#"str"#` do. + +However, macros like `format!("{foo}bar")` allocate. That's one of reasons why there are no `f"{foo}bar"` strings. In Rust, allocation is explicit. + +Someone learning about dedented strings, and learning that they're accessible as a macro rather than a string modifier similar to how `r#"string"#` is, may incorrectly assume that the reason why dedented strings require a macro is because allocation happens, and Rust is explicit in this regard. + +And when they learn about the actual behaviour, it will be surprising. + +#### Wrapping the string in a macro call causes an additional level of indentation + +With dedented string literals: + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +} +``` + +With a `dedent!` built-in macro: + +```rs +fn main() { + println!( + dedent!(" + create table student( + id int primary key, + name text + ) + ") + ); +} +``` + +With [postfix macros](https://github.com/rust-lang/rfcs/pull/2442), the situation would be better: + +```rs +fn main() { + println!(" + create table student( + id int primary key, + name text + ) + ".dedent!()); +} +``` + +However, since that RFC currently [does not](https://github.com/rust-lang/rfcs/pull/2442#issuecomment-2567115172) look like it will be included anytime soon, the ergonomics of this feature should not be blocked on postfix macros. + +#### Composability + +Dedented string literal modifier `d` composes with *all* existing string literal modifiers. + +Converting a string literal into a dedented string literal is simple, just add a `d` and fix the compile errors if necessary. + +If dedented strings were accessible as a macro `dedent!()` instead, this would be a harder transformation to do - because you now have to wrap the whole string in parenthesis and write `dedent!`. + ## Impact of *not* implementing this RFC - The Rust ecosystem will continue to rely on third-party crates like `indoc` that provide dedented string literals which only work with the macros provided by the crate. From 50725a673b67711f83744b55c1201dc895da654f Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 16:26:08 +0100 Subject: [PATCH 08/43] Add example how the last line of a dedented string could be formatted --- text/3830-dedented-string-literals.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index e50d698a524..9e9cb2dc0bf 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -940,7 +940,18 @@ fn main() { } ``` -The reasoning is that turning this into a syntax error is too strict, when it can be auto-fixed by tooling like `rustfmt`. +Reason: turning this into a syntax error is too strict, when it can be auto-fixed by tooling like `rustfmt`: + +```rs +fn main() { + println!(d" + create table student( + id int primary key, + name text + ) + "); +} +``` ## Differences from RFC 3450 From 14f09de3eadc3ecc248aab53943ee2a07a6d9708 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 16:39:23 +0100 Subject: [PATCH 09/43] fix: The arguments to `writeln!` --- text/3830-dedented-string-literals.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 9e9cb2dc0bf..f80519adbc7 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1319,10 +1319,10 @@ The problem being that we have to re-write all of the captured variables to pass {} {} - "#, - rel, - h1, - nav) + "#), + rel, + h1, + nav) ); ``` From 0c7cb800948eef5fea076e5d834fd31be82c5d18 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 16:40:51 +0100 Subject: [PATCH 10/43] fix: use literal escaped `\t` --- text/3830-dedented-string-literals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index f80519adbc7..e8783343b7b 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1557,8 +1557,8 @@ In the following example: ```rs assert_eq!( d" -\thello\n••\n\tworld - ", // note: This is a tab, not 4 spaces. +\thello\n\t\n\tworld +\t", //^^ common leading whitespace (will be removed) "hello\nworld" From 0e368fc9fb466acac072959d686031a9e1c2e2e3 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 16:44:33 +0100 Subject: [PATCH 11/43] Line containing the closing quote may include escaped tab `\t` chars --- text/3830-dedented-string-literals.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index e8783343b7b..dd2bc8755df 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -703,12 +703,6 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings -> [!NOTE] -> -> Whitespace escape characters such as `\t`, `\r` and `\n` are treated as literal code when present in the content of the dedented string, therefore the normal dedentation rules apply to them. -> -> This does not apply to `\n` after the opening quote, nor the `\n` before the line containing the closing quote. In this case escaping the newline is not allowed, it has to be a literal newline. (As described previously.) - 1. The opening line (the line containing the opening quote `"`) - Must only contain a literal newline character after the `"` token - This newline is removed. @@ -731,6 +725,12 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` - Only the amount equal to the closing indentation, or less, will be removed. - Never more than the line actually has. +### Treatment of literal escapes: `\t`, `\r` and `\n` + +- Whitespace escape characters such as `\t`, `\r` and `\n` are treated as literal code when present in the content of the dedented string, therefore the normal dedentation rules apply to them. + - This does not apply to `\n` after the opening quote, nor the `\n` before the line containing the closing quote. In this case escaping the newline is not allowed, it has to be a literal newline. (As described previously.) +- The line containing the closing quote `"` can therefore contain `\t` escapes, as they are considered to be literal tabs. + ### Edge Cases > [!NOTE] @@ -754,15 +754,14 @@ assert_eq!( "hello\nworld" ); -// This example has the same whitespace as the previous example. -// However, here we make use of whitespace escape characters +// We make use of whitespace escape characters // // This might make code more confusing, so one of the future-possibilities // is to have a warn-by-default lint to disallow these characters in dedented strings. assert_eq!( d" -\thello\n••\n\tworld - ", +\thello\n\t\n\tworld +\t", "hello\nworld" ); From 220fac7d9206cbc62fdfcdc103a4bbf740303257 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:03:42 +0100 Subject: [PATCH 12/43] fix: minor spelling --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index dd2bc8755df..373d61a6bfb 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1200,7 +1200,7 @@ Having dedented strings as a language feature, instead of relying on a macro pro ## Use a built-in macro instead -What about using a compiler built-in macro like `dedent!("string")` instead of a language-built in string modifier such as `d"string"`? +What about using a compiler built-in macro like `dedent!("string")` instead of a language built-in string modifier such as `d"string"`? ### Advantages From dfe1430d60700dbf475da2f1418c8c431b11eb13 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:06:21 +0100 Subject: [PATCH 13/43] fix: spelling --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 373d61a6bfb..a0deb8f4b1b 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1370,7 +1370,7 @@ error: could not compile `dedented` (bin "dedented") due to 1 previous error A `dedent!()` macro will have the same restriction. -This limits yet again where the dedented strings count be used. +This limits yet again where the dedented strings could be used. #### Consistency From 18dd08d4b7a9c64507b5e6c9f9e9b892b5bcb650 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+nik-rev@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:42:39 +0100 Subject: [PATCH 14/43] fix: spelling Co-authored-by: Sabrina Jewson --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index a0deb8f4b1b..3f85549ade7 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -193,7 +193,7 @@ But the improvement in output comes at a cost: As you can see, we have to choose one or the other. In either case we have to give something up. -Sometimes, we are *forced* into the first option - sacrifice readability of the source. +Sometimes, we are *forced* into the first option - sacrificing readability of the source. In some cases, producing excessive whitespace will change meaning of the output. From 4730cc8051f4495520bdd31592aca968696ed31c Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+nik-rev@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:42:53 +0100 Subject: [PATCH 15/43] fix: spelling Co-authored-by: Sabrina Jewson --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 3f85549ade7..59f1331cf1f 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -144,7 +144,7 @@ But the improvement in output comes at a cost: } ``` - This makes it confusing to tell which scope the string belongs to. This is especially true when there are multile scopes involved: + This makes it confusing to tell which scope the string belongs to. This is especially true when there are multiple scopes involved: ```rs fn main() { From 80123225ce2ed48ad6562883176f9b2a5265ab7f Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:50:49 +0100 Subject: [PATCH 16/43] fix: add a newline --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 59f1331cf1f..3815adc3efa 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -751,7 +751,7 @@ assert_eq!( ", ^^^^ // common leading whitespace (will be removed) - "hello\nworld" + "hello\n\nworld" ); // We make use of whitespace escape characters From 70faac7a4d4715aa960dc64634c3a63bed6544f9 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 17:55:51 +0100 Subject: [PATCH 17/43] Add another "sacrifice readability of source code" example with `concat!` Co-authored-by: SabrinaJewson --- text/3830-dedented-string-literals.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 3815adc3efa..32947f0b5d4 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -193,6 +193,28 @@ But the improvement in output comes at a cost: As you can see, we have to choose one or the other. In either case we have to give something up. +Another way to format the above would be the following: + +```rs +fn main() { + println!(concat!( + "create table student(\n", + " id int primary key,\n", + " name text,\n", + ")\n", + )); +} +``` + +The above: +- Is formatted nicely by `rustfmt` +- Produces the correct output + +However, it looks very noisy. +- Each line ends with an escaped `\n`. +- Requires double-quotes around each line. +- This does not allow for interpolations, such as `{variable_name}` as the format string is expanded by a macro. + Sometimes, we are *forced* into the first option - sacrificing readability of the source. In some cases, producing excessive whitespace will change meaning of the output. From 545c941a9773ec1f9a0712ef99086a3bc1634350 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 18:13:11 +0100 Subject: [PATCH 18/43] style: fix formatting for all code examples This now uses the same style as `rustfmt` for string literals inside function and macro invocations Namely, the closing `);` of the `println!(` must always align. --- text/3830-dedented-string-literals.md | 189 ++++++++++++++++---------- 1 file changed, 115 insertions(+), 74 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 32947f0b5d4..e396adbe60f 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -238,7 +238,8 @@ fn main() { id int primary key, name text ) - "); + " + ); ^^^^^^^^ // common leading whitespace (will be removed) } ``` @@ -250,8 +251,7 @@ All of the above problems are gracefully solved: 1. Nicely composes with raw string literal: `dr#"string"#`, in which the first newline *cannot* be escaped. 1. Indentation level of the statement is larger than the `println!` call, making it more obvious that the string is inside the call at a glance. -1. The closing parentheses in the SQL statement aligs with `create table` - and is 1 level larger than `println!`. +1. The closing parentheses in the SQL statement aligs with `create table`. Now, consider the example with multiple nested scopes again: @@ -263,14 +263,16 @@ fn main() { id int primary key, name text ) - "); + " + ); } println!(d" create table student( id int primary key, name text ) - "); + " + ); { { println!(d" @@ -278,7 +280,8 @@ fn main() { id int primary key, name text ) - "); + " + ); } } } @@ -305,8 +308,9 @@ fn main() { id int primary key, name text ) -"); +" // no common leading whitespace = nothing to remove + ); } ``` @@ -334,8 +338,9 @@ fn main() { id int primary key, name text ) - "); + " ^^^^ // common leading whitespace (will be removed) + ); } ``` @@ -363,8 +368,9 @@ fn main() { id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace (will be removed) + ); } ``` @@ -402,9 +408,10 @@ fn main() { id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace: 8 spaces ^^^^^^^^^^^^ // closing quote indentation: 12 spaces + ); } // spaces removed from the beginning of each line = min(8, 12) = 8 @@ -417,9 +424,10 @@ fn main() { id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace: 8 spaces ^^^^^^^^^^^^^^^^ // closing quote indentation: 16 spaces + ); } // spaces removed from the beginning of each line = min(8, 16) = 8 ``` @@ -431,9 +439,10 @@ fn main() { id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace: 8 spaces ^^^^^^^^^^^^^^^^^^^^ // closing quote indentation: 20 spaces + ); } // spaces removed from the beginning of each line = min(8, 20) = 8 ``` @@ -466,8 +475,9 @@ fn main() { id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace (will be removed) + ); } ``` @@ -561,8 +571,9 @@ let py = format!(dr#" print("{message}") hello() - "#); + "# //^^ removed +); let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; assert_eq!(py, expected); @@ -580,8 +591,9 @@ let mut py = String::new(); writeln!(py, d" def hello(): - "); + " //^^ removed +); // Note: We want to add 2 newlines here. // - `writeln!` adds 1 newline at the end @@ -592,12 +604,14 @@ writeln!(py, d" writeln!(py, dr#" print("{message}") -"#); +"# //^^ kept +); write!(py, d" hello() - "); + " +); //^^^^^^^^^^ No whitespace is removed here. // If the closing quote is after the common indentation // (in this case there is no common indentation at all), @@ -922,7 +936,8 @@ fn main() { id int primary key, name text ) - "); + " + ); } ``` @@ -937,7 +952,8 @@ fn main() { create table student( id int primary key, name text - )"); // ERROR + )" // ERROR + ); } ``` @@ -957,7 +973,8 @@ fn main() { id int primary key, name text ) - "); + " + ); } ``` @@ -970,7 +987,8 @@ fn main() { id int primary key, name text ) - "); + " + ); } ``` @@ -1025,8 +1043,9 @@ Differences: ```rs print!(d" a - "); + " ^^^^ // common leading whitespace (will be removed) + ); ``` Prints: `a` @@ -1039,8 +1058,9 @@ Differences: print!(d" a - "); + " ^^^^ // common leading whitespace (will be removed) + ); ``` The above prints: @@ -1204,11 +1224,13 @@ That's not a problem for *this* example, however with more involved macros such With this RFC, re-implementing the macros is not going to be necessary anymore, as you can just pass in the dedented string literals: ```rs -text!(d" +text!( + d" GET {url} Accept: {mime} -") +" ^^^^ // common leading whitespace (will be removed) +) ``` The language feature works with any user-defined macros that pass their arguments to `format_args!` under the hood. @@ -1243,8 +1265,9 @@ let py = format!(dr#" print("{message}") hello() - "#); + "# //^^ removed +); let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; assert_eq!(py, expected); @@ -1293,21 +1316,26 @@ Implementing dedented string literals as a macro will significantly limit their Consider a conversion from a regular string literal that prints some HTML: ```rust - writeln!(w, " \ - \ - \n\ - \n\ - \n
\ - \n

\ - \n {h1}\ - \n {nav}\ - \n

") +writeln!( + w, + " \ + \ + \n\ + \n\ + \n
\ + \n

\ + \n {h1}\ + \n {nav}\ + \n

" +) ``` Into a dedented string literal: ```rust - writeln!(w, dr#" +writeln!( + w, + dr#" @@ -1316,7 +1344,8 @@ Into a dedented string literal: {h1} {nav} - "#); + "# +); ``` The above conversion is elegant for these reasons: @@ -1330,21 +1359,25 @@ With a dedented string *macro*, it's a much more involved process. The above wil The problem being that we have to re-write all of the captured variables to pass them to the `writeln!` and not the dedented string itself: ```rust - writeln!(w, - dedent!(r#" - - - -
-

- {} - {} -

- "#), - rel, - h1, - nav) - ); +writeln!( + w, + dedent!( + r#" + + + +
+

+ {} + {} +

+ "# + ), + rel, + h1, + nav + ) +); ``` Which is unfortunate. @@ -1418,12 +1451,14 @@ With dedented string literals: ```rs fn main() { - println!(d" - create table student( - id int primary key, - name text - ) - "); + println!( + d" + create table student( + id int primary key, + name text + ) + " + ); } ``` @@ -1432,13 +1467,15 @@ With a `dedent!` built-in macro: ```rs fn main() { println!( - dedent!(" - create table student( - id int primary key, - name text - ) - ") - ); + dedent!( + " + create table student( + id int primary key, + name text + ) + " + ) + ); } ``` @@ -1446,12 +1483,14 @@ With [postfix macros](https://github.com/rust-lang/rfcs/pull/2442), the situatio ```rs fn main() { - println!(" - create table student( - id int primary key, - name text - ) - ".dedent!()); + println!( + " + create table student( + id int primary key, + name text + ) + ".dedent!() + ); } ``` @@ -1544,7 +1583,7 @@ fn main() { id int primary key, name text ) - "); + "); ^^^^ // common leading whitespace (will be removed) } ``` @@ -1553,13 +1592,15 @@ It could be automatically formatted by adding additional leading indentation, in ```rust fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text ) - "); + " ^^^^^^^^ // common leading whitespace (will be removed) + ); } ``` From 8b9bcc3c60d034017c60c022cdca8b23a50741d9 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 18:25:19 +0100 Subject: [PATCH 19/43] Add section explaining how to have a trailing newline --- text/3830-dedented-string-literals.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index e396adbe60f..a1c12f2b406 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -707,6 +707,33 @@ let py = d" // OK ``` +Both outputs will not contain a newline at the end, since the literal newline is stripped. + +If you'd like to have a trailing newline, you can insert a literal newline at the end: + +```rust +let py = d" + def hello(): + print('Hello, world!') + + hello() + + "; +// OK +``` + +You can also use an escaped newline. This is fine, because the string still ends with a literal newline (which cannot be escaped): + +```rust +let py = d" + def hello(): + print('Hello, world!') + + hello()\n + "; +// OK +``` + Benefits the above rules bring include: - The above rules make all dedented string literals you'll find in Rust consistent. From 013a68c2e87b8d10403dc0c1447de0644981c5d5 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 18:33:49 +0100 Subject: [PATCH 20/43] Align all opening quotes with closing quotes. Match `rustfmt` This formats all of the examples to be how rustfmt formats strings today. --- text/3830-dedented-string-literals.md | 107 ++++++++++++++++---------- 1 file changed, 67 insertions(+), 40 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index a1c12f2b406..11255a4c5ed 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -233,7 +233,8 @@ This allows us to have a more readable version of the above: ```rust fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -258,7 +259,8 @@ Now, consider the example with multiple nested scopes again: ```rs fn main() { { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -266,7 +268,8 @@ fn main() { " ); } - println!(d" + println!( + d" create table student( id int primary key, name text @@ -275,7 +278,8 @@ fn main() { ); { { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -303,7 +307,8 @@ This allows all lines to have a common indentation. ```rust fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -333,13 +338,14 @@ In order to strip the first level of indentation, the ending quote is aligned to ```rust fn main() { - println!(d" - create table student( - id int primary key, - name text - ) - " -^^^^ // common leading whitespace (will be removed) + println!( + d" + create table student( + id int primary key, + name text + ) + " +^^^^^^^^ // common leading whitespace (will be removed) ); } ``` @@ -363,7 +369,8 @@ All indentation can be stripped by placing the closing double quote on the same ```rust fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -403,7 +410,8 @@ create table student( ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -419,7 +427,8 @@ fn main() { ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -434,7 +443,8 @@ fn main() { ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -470,7 +480,8 @@ The `format_args!` macro, and by extension all wrapper macros that pass argument fn main() { let table_name = "student"; - println!(d" + println!( + d" create table {table_name}( id int primary key, name text @@ -566,7 +577,8 @@ You can use them in formatting macros, such as `println!`, `write!`, `assert_eq! ```rs let message = "Hello, world!"; -let py = format!(dr#" +let py = format!( + dr#" def hello(): print("{message}") @@ -589,7 +601,9 @@ let mut py = String::new(); // Note: Using `writeln!` because the final newline from dedented strings is removed. (more info later) -writeln!(py, d" +writeln!( + py, + d" def hello(): " //^^ removed @@ -601,14 +615,18 @@ writeln!(py, d" // to insert the 2nd newline // Remember, dedented string literals strip the last newline. -writeln!(py, dr#" +writeln!( + py, + dr#" print("{message}") "# //^^ kept ); -write!(py, d" +write!( + py, + d" hello() " ); @@ -959,7 +977,8 @@ Consider the following which is invalid: ```rs fn main() { // ERROR - println!(d"create table student( + println!( + d"create table student( id int primary key, name text ) @@ -975,7 +994,8 @@ The following is also incorrect, as there is no newline before the line containi ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -995,7 +1015,8 @@ string content is allowed: ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -1009,7 +1030,8 @@ Reason: turning this into a syntax error is too strict, when it can be auto-fixe ```rs fn main() { - println!(d" + println!( + d" create table student( id int primary key, name text @@ -1068,7 +1090,8 @@ Differences: However, in this RFC the following: ```rs - print!(d" + print!( + d" a " ^^^^ // common leading whitespace (will be removed) @@ -1082,7 +1105,8 @@ Differences: In order to add a newline at the end, you have to add a newline in the source code: ```rs - print!(d" + print!( + d" a " @@ -1287,7 +1311,8 @@ One of the major benefits of having dedented string literals is that you'll be a let message = "Hello, world!"; // `{message}` is interpolated -let py = format!(dr#" +let py = format!( + dr#" def hello(): print("{message}") @@ -1512,11 +1537,11 @@ With [postfix macros](https://github.com/rust-lang/rfcs/pull/2442), the situatio fn main() { println!( " - create table student( - id int primary key, - name text - ) - ".dedent!() + create table student( + id int primary key, + name text + ) + ".dedent!() ); } ``` @@ -1605,13 +1630,15 @@ If indentation of the dedented string does not match the surrounding code: ```rust fn main() { - println!(d" - create table student( - id int primary key, - name text - ) - "); -^^^^ // common leading whitespace (will be removed) + println!( + d" + create table student( + id int primary key, + name text + ) + " +^^^^^^^^ // common leading whitespace (will be removed) + ); } ``` From 7a9a58b5960d043a4e88ad6ef1917b85ec2b181b Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 18:56:37 +0100 Subject: [PATCH 21/43] Change behaviour of escaped literals: `\t`, `\r` and `\n` --- text/3830-dedented-string-literals.md | 61 ++++++++++++++------------- 1 file changed, 31 insertions(+), 30 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 11255a4c5ed..68c739213dd 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -808,9 +808,35 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ### Treatment of literal escapes: `\t`, `\r` and `\n` -- Whitespace escape characters such as `\t`, `\r` and `\n` are treated as literal code when present in the content of the dedented string, therefore the normal dedentation rules apply to them. - - This does not apply to `\n` after the opening quote, nor the `\n` before the line containing the closing quote. In this case escaping the newline is not allowed, it has to be a literal newline. (As described previously.) -- The line containing the closing quote `"` can therefore contain `\t` escapes, as they are considered to be literal tabs. +`\t` is allowed in the line which contains the closing quote. Writing it is equivalent to inserting a literal tab. + +The escaped characters `\t`, `\r` and `\n` are treated as regular characters for the purposes of dedentation. + +So the following: + +```rs +println!( + d" + \ta + \tb + \tc + \t" +); +``` + +Prints, with each indentation being **1 tab**: + +``` + a + b + c +``` + +The indentation is not removed, because common indentation in this example is 0. + +Escaped characters at the beginning of the string are interpreted as any other character, and **not** whitespace. + +After the dedentation is calculated, the escapes then expand into their literal counterparts. ### Edge Cases @@ -835,18 +861,6 @@ assert_eq!( "hello\n\nworld" ); -// We make use of whitespace escape characters -// -// This might make code more confusing, so one of the future-possibilities -// is to have a warn-by-default lint to disallow these characters in dedented strings. -assert_eq!( - d" -\thello\n\t\n\tworld -\t", - - "hello\nworld" -); - // line consisting of only spaces is allowed // However, nothing is removed because the: @@ -1668,19 +1682,6 @@ There could be a lint which detects strings which could be written clearer as de ## `rustc` warn-by-default lint to disallow whitespace escape characters -In the following example: - -```rs -assert_eq!( - d" -\thello\n\t\n\tworld -\t", -//^^ common leading whitespace (will be removed) - - "hello\nworld" -); -``` - -Using escaped whitespace characters is the same as if the characters were written literally. (in the *content* of the string. This excludes the requirement of a **literal** newline after the double quote and before the line of the closing quote). +As explained in the [reference level explanation](#reference-level-explanation), using whitespace escapes `\t`, `\n` and `\r` are allowed. -This is confusing, and might not work in the way people expect it to work. A warn-by-default lint could be added to disallow `\n`, `\t` and `\r` in dedented strings. (Or for instance, only allow `\t` anytime after the stripped indentation) +Their behaviour might be surprising, so it is worth to consider a warn-by-default lint for them. From 64e289e7ecf6a771b8d6154c5044a53c3b0dbb9b Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 19:21:56 +0100 Subject: [PATCH 22/43] fix: Use semicolon --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 68c739213dd..420a3732408 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -927,7 +927,7 @@ assert_eq!( // is removed in all cases #[expect_compile_error] let _ = d" - ", + "; ```` # Drawbacks From b4dcfd00eddf8de1c809799170486c0cc4d9a72c Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+nik-rev@users.noreply.github.com> Date: Fri, 6 Jun 2025 19:22:35 +0100 Subject: [PATCH 23/43] fix: Word Co-authored-by: Josh Triplett --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 420a3732408..0c0feed369f 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -227,7 +227,7 @@ But, what if we could have the best of both worlds? In order to solve these problems, the RFC proposes dedented string literals of the form: `d"string"`. -Common leading whitespace on each line after the closing quote in dedented string literals will be stripped at compile-time. +Common leading whitespace on each line after the opening quote in dedented string literals will be stripped at compile-time. This allows us to have a more readable version of the above: From c8673ad7a8c05ac002f9ed62ebdec0316e3002d5 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 19:30:39 +0100 Subject: [PATCH 24/43] Disallow whitespace escapes in the closing line Co-authored-by: Jacob Lifshay --- text/3830-dedented-string-literals.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 0c0feed369f..a68d0d28cbf 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -806,9 +806,14 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` - Only the amount equal to the closing indentation, or less, will be removed. - Never more than the line actually has. -### Treatment of literal escapes: `\t`, `\r` and `\n` +### Treatment of literal whitespace escapes: `\t`, `\r` and `\n` -`\t` is allowed in the line which contains the closing quote. Writing it is equivalent to inserting a literal tab. +#### On the line containing the closing quote + +- Only whitespace is allowed before the closing quote. +- Escapes are not permitted even if they are escapes for whitespace (e.g. a tab escape `\t`), because escapes are processed after dedenting, so they are not yet whitespace when the line with the closing quote is processed. + +#### In the content of the string The escaped characters `\t`, `\r` and `\n` are treated as regular characters for the purposes of dedentation. @@ -820,7 +825,7 @@ println!( \ta \tb \tc - \t" + " // the indent here is a tab ); ``` @@ -832,7 +837,7 @@ Prints, with each indentation being **1 tab**: c ``` -The indentation is not removed, because common indentation in this example is 0. +The indentation is not removed, because common indentation in this example is 0. (closing indentation is 1 tab). Escaped characters at the beginning of the string are interpreted as any other character, and **not** whitespace. From af7fc31d06f6890839ed60deba59a8e186a3c6bb Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 19:39:45 +0100 Subject: [PATCH 25/43] Fix confusing indentation in example --- text/3830-dedented-string-literals.md | 30 +++++++++++++-------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index a68d0d28cbf..4c0874b6282 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1389,8 +1389,7 @@ Consider a conversion from a regular string literal that prints some HTML: ```rust writeln!( w, - " \ - \ + " \ \n\ \n\ \n
\ @@ -1434,20 +1433,19 @@ writeln!( w, dedent!( r#" - - - -
-

- {} - {} -

- "# - ), - rel, - h1, - nav - ) + + + +
+

+ {} + {} +

+ "# + ), + rel, + h1, + nav ); ``` From 7f9417cbb29cc64cc8cc950b2fd229a1bc2c75aa Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Fri, 6 Jun 2025 19:50:04 +0100 Subject: [PATCH 26/43] Remove incorrect description of the chosen acronym --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 4c0874b6282..a3386558c01 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -961,7 +961,7 @@ The syntax of `d"string"` is chosen for the following reasons: - Fits with existing string modifiers, such as `b"string"`, `r#"string"#"` and `c"string"` - Composes with existing string modifiers: `db"string"`, `dc"string"`, `dr#"string"#`, and `dbr#"string"#`. - Does not introduce a lot of new syntax. Dedented string literals can be explained in terms of existing language features. -- The acronym `d` for `dedent` is both clear, and not taken by any of the other string modifiers. +- The acronym `d` for `dedent` is understandable, and not taken by any of the other string modifiers. - Adding a single letter `d` before a string literal to turn it into a dedented string literal is an incredibly easy modification. - Rust reserves space for additional string modifiers. From ae9a66822117d639b7df69a2d9de1a94ec0b084f Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Sat, 7 Jun 2025 14:19:35 +0100 Subject: [PATCH 27/43] Remove note about injected language into string Co-authored-by: Travis Cross --- text/3830-dedented-string-literals.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index a3386558c01..2af9173b2bc 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1087,18 +1087,12 @@ Differences: ``` With the `sql` not affecting the output, but can aid in syntax highlighting and such. - - 1. This is not necessary, as at the moment you can add a block comment next to the string, which syntax highlighters can use *today* to inject whatever language is specified. - - ```rs - let sql = /* sql */ "SELECT * FROM table;"; - ``` - 2. Is considered out of scope for this RFC to consider. + 1. Is considered out of scope for this RFC to consider. It would be a backward-compatible change to make for a future RFC, if it's desired. - 3. [Expression attributes](https://github.com/rust-lang/rust/issues/15701) are likely to be more suitable for this purpose. (not part of this RFC) + 1. [Expression attributes](https://github.com/rust-lang/rust/issues/15701) are likely to be more suitable for this purpose. (not part of this RFC) ```rs let sql = #[editor::language("sql")] "SELECT * FROM table;"; From b5cdd57852f99663594516fa9f44b9b7fb187e59 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Sat, 7 Jun 2025 15:07:57 +0100 Subject: [PATCH 28/43] Explain why the letter `d` is the choice, rather than other letters --- text/3830-dedented-string-literals.md | 151 ++++++++++++++++++++++++-- 1 file changed, 140 insertions(+), 11 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 2af9173b2bc..b29894497ba 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -954,14 +954,151 @@ let _ = d" ## Design -### The choice of `d"string"` specifically +### The choice of the letter `d` for "dedent" + +When picking a single letter for this feature, we want: + +- A letter that represents a mnemonic +- The mnemonic should make sense +- And be memorable + +The RFC picks `d` as a mnemonic for "dedent". + +- Dedentation is a simple atomic operation which removes the leading indentation of the string +- The transformation is always a dedentation + + If there is no leading indentation, removing the it is still accurately described as a "dedentation" because the nothing is removed. +- It might help make the acronym more memorable by thinking about the `d` as "**d**eleting" the leading indentation. + +#### Why not `u` for "unindent" + +Confusion can arise due to the way this string prefix has been used in other languages: + +- In Python 2, `u` is a prefix for Unicode strings +- In C++, `u` is used for UTF-16 strings + +The goal a single-letter acronym hopes to accomplish is to be memorable and make sense. +It can be argued that the word "Unindent" is more complex than the word "Dedent": + +- Unindent contains a negation, consisting of two "parts": **un** + **indent**. Undoing an indentation. +- Dedent represents an atomic operation, which is removal of indentation and is a synonym to unindent. + +Using a negated word can be considered to be less desireable, because in order to undo the negation we have to perform an extra "step" when thinking about it. + +Consider that instead of a negated `if` condition: + +```rs +if !string.is_empty() { + walk() +} else { + run() +} +``` + +Writing the non-negated version first is often clearer: + +```rs +if string.is_empty() { + run() +} else { + walk() +} +``` + +Using a word with a lower cognitive complexity may make it easier to think about and more memorable. + +#### Why not `i` for "indent" + +Indent is the opposite of dedent. It could make sense, but from a completely different perspective. + +The question is, which one do we value more: + +- A word that describes what the string looks like in the source code. +- A word that describes the transformation that the string goes through when it is evaluated. + +"Indent" describes what the string looks like in the source code: + +```rs +fn main() { + let table_name = "student"; + + println!( + d" + create table {table_name}( + id int primary key, + name text + ) + " + ); +} +``` + +But it does not describe the transformation that it goes through: + +```sh +create table student( + id int primary key, + name text +) +``` + +When the string is evaluated, the leading indentation is removed. It is **dedented**. + +In the source code, the string is **indented**. + +- When viewing the string from the source code, the indentation is obvious. + + However, it is *not* obvious what will happen to the string when it is evaluated. "Dedent" can be clearer in this regard, as we already have 1 piece of information and the word "dedent" brings us the other piece. + +- The string may not always be considered to be indented: + + ```rs + let _ = d" + hello world + "; + ``` + + In the above example, there is no indentation for the strings. It would be inaccurate to describe the string as having indentation. + + Once the string is evaluated, it is accurate to describe the removal of the non-existing indentation as still "dedenting" the string. + +#### Why not `m` for "multi-line" + +- Dedented string literals do not necesserily represent a multi-line string: + +```rs +let _ = d" +hello world +"; +``` + +The above is equivalent to: + +```rs +let _ = "hello world"; +``` + +Confusion could arise, as people expect it to evaluate to a string spanning multile lines. + +#### Why not `h` for "heredoc" + +RFC #3450 uses `h` as the modifier instead of `d`, as an acronym for [Here document](https://en.wikipedia.org/wiki/Here_document). + +- The term is likely to be less known, and may raise confusion, especially amongst + those that don't know what it is. +- Here documents are more associated with "code blocks", which may associate an "info string" + with them (such as in markdown). This RFC does not propose an info string. + +While the feature this RFC proposes (dedented string literals) are useful for code +blocks, it is not just for them. + +### The choice of the form `d"string"` The syntax of `d"string"` is chosen for the following reasons: - Fits with existing string modifiers, such as `b"string"`, `r#"string"#"` and `c"string"` - Composes with existing string modifiers: `db"string"`, `dc"string"`, `dr#"string"#`, and `dbr#"string"#`. - Does not introduce a lot of new syntax. Dedented string literals can be explained in terms of existing language features. -- The acronym `d` for `dedent` is understandable, and not taken by any of the other string modifiers. - Adding a single letter `d` before a string literal to turn it into a dedented string literal is an incredibly easy modification. - Rust reserves space for additional string modifiers. @@ -1066,15 +1203,7 @@ The [RFC #3450: Propose code string literals](https://github.com/rust-lang/rfcs/ Differences: -- #3450 uses `h` as the modifier instead of `d`. - - proposes using `h` as acronym for [Here document](https://en.wikipedia.org/wiki/Here_document). - - The term is likely to be less known, and may raise confusion. - - Additionally, here documents are more associated with "code blocks". While this feature is useful for code blocks, it is not just for them. - - While the `d` mnemonic for **dedent** clearly describes what actually happens to the strings. +- #3450 uses `h` as the modifier instead of `d`. Explained [earlier](#why-not-h-for-heredoc) - #3450 allows to write an *info string*, like in markdown. From e36f8ed956acc8b4a8e817d3595efacd0e62765f Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Sat, 7 Jun 2025 15:29:09 +0100 Subject: [PATCH 29/43] Clarify why the dedented string always ends with a newline --- text/3830-dedented-string-literals.md | 59 +++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index b29894497ba..ee666dff097 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1164,6 +1164,65 @@ fn main() { - In the example above, it is not immediately clear where that would be from. - It easy to modify the common indentation level of the string in the future, as you do not have to create a new line. +### The choice of not ending with a newline + +Dedented string literals do not end with a newline. + +The following: + +```rs +fn main() { + print!( + d" + create table student( + id int primary key, + name text + ) + " + ); +} +``` + +Prints, *without* a newline at the end: + +```sh +create table student( + id int primary key, + name text +) +``` + +In order to add a final newline, an extra blank line needs to be added at the end: + +```rs +fn main() { + print!( + d" + create table student( + id int primary key, + name text + ) + + " + ); +} +``` + +Removing the final newline is consistent with removing the initial newline. + +The line containing the opening quote `"` and the line containing the closing quote `"` can be considered to be fully exempt from the output. + +If this *wasn't* the behaviour: +- It would make less sense to remove the newline from the beginning, but not from the end. +- Dedented strings would always end with a newline +- ..But how do you opt-out of the newline? + + Using a special syntax, like closing with a `-"` (as a different RFC proposes) would be too special-cased, it wouldn't fit in with the rest of the language. + + It would be confusing for those that want to end the dedented string with a `-`. + +Removing *both* the newline at the start and the end is consistent, and allows maximum flexibility whilst not making additional trade-offs such as having to introduce new special syntax to exclude the newline. + ### Allowing the closing line to be indented more than previous lines Having the quote be indented further than the first non-whitespace character in the From c0bddbd72abe43ad39964cb5ab379becbc871ce5 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Sat, 7 Jun 2025 15:32:32 +0100 Subject: [PATCH 30/43] Clarify what is meant by "Whitespace" --- text/3830-dedented-string-literals.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index ee666dff097..77d193ac74e 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -784,6 +784,8 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings +Whitespace is spaces or horizontal tabs. + 1. The opening line (the line containing the opening quote `"`) - Must only contain a literal newline character after the `"` token - This newline is removed. From 8b4422c92cc60d8e1ce1081acd1a8e0537dc0b86 Mon Sep 17 00:00:00 2001 From: Nik Revenco <154856872+NikitaRevenco@users.noreply.github.com> Date: Sat, 7 Jun 2025 15:43:20 +0100 Subject: [PATCH 31/43] Clarify what is meant by an empty line --- text/3830-dedented-string-literals.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 77d193ac74e..b119e36efa6 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -784,7 +784,8 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings -Whitespace is spaces or horizontal tabs. +Whitespace is literal spaces or literal horizontal tabs. +An empty line only consists of literal spaces, literal horizontal tabs or literal newlines 1. The opening line (the line containing the opening quote `"`) - Must only contain a literal newline character after the `"` token @@ -799,7 +800,7 @@ Whitespace is spaces or horizontal tabs. It is the largest amount of leading whitespace shared by all non-empty lines. -1. For each non-empty line, remove the smallest amount of leading whitespace that satisfies: +1. For each line, remove the smallest amount of leading whitespace that satisfies: - `min(common indentation, closing indentation)` From acda9b22eb93af0882ac3a74c5a7627143564c7b Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Mon, 9 Jun 2025 21:11:38 +0100 Subject: [PATCH 32/43] Add drawback: large string modifier count can be confusing Co-authored-by: Ed Page --- text/3830-dedented-string-literals.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index b119e36efa6..c8672263f5b 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -941,6 +941,8 @@ let _ = d" # Drawbacks [drawbacks]: #drawbacks +- The more string literal modifiers that are stacked on each other, more work is needed to decipher it and can feel a bit too foreign + - Contributes to the increase of string literal modifiers by adding a new variant. While at the moment the variety of string literal modifiers is small, it is worth to think about the implications of exponential increase of them. From 9de6bcd7c39c0366c66301b0b6679a7b85c29133 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Mon, 9 Jun 2025 21:18:19 +0100 Subject: [PATCH 33/43] Add drawback about `dr"..."` strings Co-authored-by: Ed Page --- text/3830-dedented-string-literals.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index c8672263f5b..631bc7ed12f 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -941,6 +941,8 @@ let _ = d" # Drawbacks [drawbacks]: #drawbacks +- While the reference specifies `r` as ["not processing ay escapes"](https://doc.rust-lang.org/reference/tokens.html#raw-string-literals), users are less likely familiar with the exact definition and more familiar with the name and the affect: it leaves the string as-is. This can feel contradictory to `d` which is a specific form of modifying the string content and so a `dr""` could read as something that should be a compilation error. + - The more string literal modifiers that are stacked on each other, more work is needed to decipher it and can feel a bit too foreign - Contributes to the increase of string literal modifiers by adding a new variant. From 7bbf74b1655deb0baa940b5d47579378c315afb7 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Mon, 9 Jun 2025 21:19:33 +0100 Subject: [PATCH 34/43] fix: Spelling --- text/3830-dedented-string-literals.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 631bc7ed12f..2c386763a01 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -941,7 +941,9 @@ let _ = d" # Drawbacks [drawbacks]: #drawbacks -- While the reference specifies `r` as ["not processing ay escapes"](https://doc.rust-lang.org/reference/tokens.html#raw-string-literals), users are less likely familiar with the exact definition and more familiar with the name and the affect: it leaves the string as-is. This can feel contradictory to `d` which is a specific form of modifying the string content and so a `dr""` could read as something that should be a compilation error. +- While the reference specifies `r` as ["not processing any escapes"](https://doc.rust-lang.org/reference/tokens.html#raw-string-literals), users are less likely familiar with the exact definition and more familiar with the name and the effect: it leaves the string as-is. + + This can feel contradictory to `d` which is a specific form of modifying the string content and so a `dr""` could read as something that should be a compilation error. - The more string literal modifiers that are stacked on each other, more work is needed to decipher it and can feel a bit too foreign From c27b2b792f241d3824baedfd14231f224b020e90 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Thu, 12 Jun 2025 23:14:48 +0100 Subject: [PATCH 35/43] fix: minor Co-authored-by: DragonDev1906 <8270201+DragonDev1906@users.noreply.github.com> --- text/3830-dedented-string-literals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 2c386763a01..cf91ae909ca 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -498,8 +498,8 @@ fn main() { Any kind of string literal can turn into a "dedented" string literal if it is prefixed with a `d`: - strings: `"string"` -> `d"string"` -- Raw strings: `r#"string"` -> `dr#"string"` -- Byte strings: `b#"string"` -> `db#"string"` +- Raw strings: `r#"string"#` -> `dr#"string"#` +- Byte strings: `b"string"` -> `db"string"` - ...and others... > [!NOTE] From d0b4c2737dee4209db2e32ce426eaab3dd5fb9e7 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Thu, 12 Jun 2025 23:15:13 +0100 Subject: [PATCH 36/43] fix: wording Co-authored-by: DragonDev1906 <8270201+DragonDev1906@users.noreply.github.com> --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index cf91ae909ca..bbb17738aa6 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -463,7 +463,7 @@ Dedented string literals `d"string"` are a new modifier for strings. They are similar to byte strings `b"string"` and raw strings `r#"string"#`. -They compose with other every other string literal modifier. +They compose with others like every other string literal modifier. To be precise, the RFC introduces 6 new types of string literals: From 1f68236355495ccfb9d8577d2bcbae5adf958823 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 17:54:52 +0100 Subject: [PATCH 37/43] Prior art: add `inspect.cleandoc` from Python --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index bbb17738aa6..8bd5c5d9a08 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1782,7 +1782,7 @@ In other languages: - _Scala_ - [multiline strings](https://docs.scala-lang.org/overviews/scala-book/two-notes-about-strings.html) using triple-quotes and `.stripMargin`. - _C#_ - [Raw string literals](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/raw-string) -- _Python_ - [multiline strings](https://docs.python.org/3/library/textwrap.html) using triple-quotes +- _Python_ - [multiline strings](https://docs.python.org/3/library/textwrap.html) using triple-quotes and [`inspect.cleandoc`](https://docs.python.org/3/library/inspect.html#inspect.cleandoc) to avoid escaping and `textwrap.dedent`. - _Jsonnet_ - [text blocks](https://jsonnet.org/learning/tutorial.html) with `|||` as a delimiter. - _Bash_ - [`<<-` Heredocs](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04). From 995efe824339769ea5fb4408cb83e3070c211aff Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 18:50:37 +0100 Subject: [PATCH 38/43] Disambiguate "whitespace" and "newline", use more technical terms --- text/3830-dedented-string-literals.md | 213 ++++++++++++++++---------- 1 file changed, 130 insertions(+), 83 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 8bd5c5d9a08..5089542faef 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -75,7 +75,7 @@ Which outputs (using `^` to mark the beginning of a line, and `·` to mark a lea ^ ``` -The output is formatted in an unconventional way, containing excessive leading whitespace. +The output is formatted in an unconventional way, containing excessive leading indentation. The alternative allows for a sane output, but at the cost of making the code less readable: @@ -178,7 +178,7 @@ But the improvement in output comes at a cost: All of the strings end up on the same level, despite them being in different scopes. -3. The closing double-quote must be put at the beginning of the line, in order not to introduce trailing whitespace: +3. The closing double-quote must be put at the beginning of the line, in order not to introduce trailing horizontal whitespace: ```diff fn main() { @@ -227,7 +227,7 @@ But, what if we could have the best of both worlds? In order to solve these problems, the RFC proposes dedented string literals of the form: `d"string"`. -Common leading whitespace on each line after the opening quote in dedented string literals will be stripped at compile-time. +Common leading indentation on each line after the opening quote in dedented string literals will be stripped at compile-time. This allows us to have a more readable version of the above: @@ -241,7 +241,7 @@ fn main() { ) " ); -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) } ``` @@ -295,7 +295,7 @@ It is immediately more obvious which string belongs to which scope. ## Closing quote controls the removed indentation -All of the common whitespace between each line, which has a higher indentation than the indentation of the line of closing quote (contained in the last line) is stripped. +From the column containing the closing quote `"`, common leading horizontal whitespace is stripped from each line. Here are a few examples to demonstrate. @@ -314,7 +314,7 @@ fn main() { name text ) " -// no common leading whitespace = nothing to remove +// no common leading indentation = nothing to remove ); } ``` @@ -345,7 +345,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) ); } ``` @@ -376,12 +376,12 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) ); } ``` -The indentation of the ending double quote is 8 spaces. This common prefix of leading whitespace characters will be removed from the beginning of each line. +The indentation of the ending double quote is 8 spaces. This common prefix of leading horizontal whitespace characters will be removed from the beginning of each line. Prints: @@ -395,7 +395,7 @@ create table student( Result: **all indentation from source is stripped**. Indenting the closing double quote further will have zero impact. -The dedentation will never remove non-whitespace characters. +The dedentation will never remove non-horizontal-whitespace characters. Each of the following **examples** print: @@ -417,7 +417,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^ // common leading indentation: 8 spaces ^^^^^^^^^^^^ // closing quote indentation: 12 spaces ); } @@ -434,7 +434,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^ // common leading indentation: 8 spaces ^^^^^^^^^^^^^^^^ // closing quote indentation: 16 spaces ); } @@ -450,7 +450,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace: 8 spaces +^^^^^^^^ // common leading indentation: 8 spaces ^^^^^^^^^^^^^^^^^^^^ // closing quote indentation: 20 spaces ); } @@ -487,7 +487,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) ); } ``` @@ -521,7 +521,7 @@ assert_eq!(regular, "\n I am a regular string literal.\n "); let dedented = d" I am a dedented string literal! "; //^ newline is removed -//^^ whitespace is removed +//^^ indentation is removed assert_eq!(dedented, "I am a dedented string literal!"); ``` @@ -535,8 +535,8 @@ Indentation present *after* the double-quote is kept: let dedented = d" I am a dedented string literal! "; //^ newline is removed -//^^ whitespace is removed -// ^^^^ indentation after the double quote is kept +//^^ indentation is removed +// ^^^^ horizontal whitespace after the double quote is kept assert_eq!(dedented, " I am a dedented string literal!"); ``` @@ -591,7 +591,7 @@ let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; assert_eq!(py, expected); ``` -By placing the ending quote earlier than the first non-whitespace character in any of the lines, you can reduce how much space is removed from the beginning of each line: +By placing the closing quote `"` earlier than the first non-horizontal-whitespace character in any of the lines, you can reduce how much indentation is removed from each line: ```rs use std::io::Write as _; @@ -630,10 +630,10 @@ write!( hello() " ); -//^^^^^^^^^^ No whitespace is removed here. +//^^^^^^^^^^ No indentation is removed here. // If the closing quote is after the common indentation // (in this case there is no common indentation at all), -// all of the whitespace is stripped +// all of the common indentation is stripped let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()"; assert_eq!(py, expected); @@ -641,16 +641,16 @@ assert_eq!(py, expected); ## Rules -### Dedented string literals must begin with a newline +### Dedented string literals must begin with an end-of-line character (EOL) -All dedented string literals must begin with a newline. -This newline is removed. +All dedented string literals must begin with an EOL. +This EOL is removed. The following is invalid: ```rust -// ↓ error: expected literal newline. -// note: dedented string literals must start with a literal newline +// ↓ error: expected literal EOL +// note: dedented string literals must start with a literal EOL // help: insert a literal newline here: let py = d"def hello(): print('Hello, world!') @@ -659,11 +659,11 @@ let py = d"def hello(): "; ``` -Escape-code newline is not supported, it must be a literal newline: +Escaped EOL such as an escaped newline (`\n`), it must be a literal EOL: ```rust -// ↓ error: expected literal newline, but found escaped newline. -// note: dedented string literals must start with a literal newline +// ↓ error: expected literal EOL, but found escaped newline. +// note: dedented string literals must start with a literal EOL let py = d"\ndef hello(): print('Hello, world!') @@ -683,9 +683,9 @@ let py = d" "; ``` -### Last line must be empty, and preceded by a literal newline +### Last line must be empty, and preceded by a literal EOL -The line which contains the closing quote `"` must be empty, and the character before the last line must be a literal newline character. +The line which contains the closing quote `"` can only contain horizontal whitespace, and the character before the last line must be a literal EOL. This is invalid: @@ -695,12 +695,13 @@ let py = d" print('Hello, world!') hello()"; -// ^ error: expected literal newline +// ^ error: expected literal EOL // note: in dedented string literals, the line -// which contains the closing quote must be empty +// which contains the closing quote can +// only contain horizontal whitespace ``` -Neither is using an escaped newline `\n` instead of the literal newline: +Neither is using an escaped EOL (e.g. escaped newline `\n`) instead of the literal EOL: ```rust let py = d" @@ -708,9 +709,10 @@ let py = d" print('Hello, world!') hello()\n"; -// ^ error: expected literal newline, but found escaped newline +// ^ error: expected literal EOL, but found escaped newline `\n` // note: in dedented string literals, the line -// which contains the closing quote must be empty +// which contains the closing quote can +// only conatin horizontal whitespace ``` This is the correct syntax for the last line: @@ -725,9 +727,9 @@ let py = d" // OK ``` -Both outputs will not contain a newline at the end, since the literal newline is stripped. +Both outputs will not contain EOL at the end, since the literal EOL is stripped. -If you'd like to have a trailing newline, you can insert a literal newline at the end: +If you'd like to have a trailing EOL, you can insert a literal newline at the end (or any other EOL): ```rust let py = d" @@ -740,7 +742,7 @@ let py = d" // OK ``` -You can also use an escaped newline. This is fine, because the string still ends with a literal newline (which cannot be escaped): +You can also use an escaped newline. This is fine, because the string still ends with a literal EOL (which cannot be escaped): ```rust let py = d" @@ -755,12 +757,23 @@ let py = d" Benefits the above rules bring include: - The above rules make all dedented string literals you'll find in Rust consistent. -- It allows easily changing the indentation level without having to insert a newline sometimes. +- It allows easily changing the indentation level without having to insert an EOL sometimes. - It gives the ability for us to tell a regular string literal from a dedented string literal at a glance. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation +## Terms used + +We use these terms throughout the RFC and they are explained in detail in this section. + +- **Whitespace** is as defined in the [reference](https://doc.rust-lang.org/reference/whitespace.html) as any character with the [`Pattern_White_Space`](https://www.unicode.org/reports/tr31/) unicode property +- **Horizontal whitespace** is spaces or tabs. These are the only `Pattern_White_Space` characters that are *horizontal* space per [UAX#31](https://www.unicode.org/reports/tr31/#Contexts_for_Ignorable_Format_Controls) +- **EOL (end-of-line) character** is any "end of line" character as classified in [`UAX#R3a-1`](https://www.unicode.org/reports/tr31/#R3a-1) +- **Indentation** is one or more **horizontal whitespace** at the beginning of a line + +A "newline" is used as an example of a specific EOL character, however any other valid EOL character can be used. + ## String Literals 6 new [string literal](https://doc.rust-lang.org/reference/tokens.html#characters-and-strings) types: @@ -784,23 +797,22 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings -Whitespace is literal spaces or literal horizontal tabs. -An empty line only consists of literal spaces, literal horizontal tabs or literal newlines +An empty line only consists of literal whitespace (*not* escaped whitespace such as `\n`) 1. The opening line (the line containing the opening quote `"`) - - Must only contain a literal newline character after the `"` token - - This newline is removed. + - Must only contain a literal EOL after the `"` token + - This EOL is removed. 1. The closing line (the line containing the closing quote `"`) - - Must contain only whitespace before the closing quote - - This whitespace is the *closing indentation*. + - Must contain only horizontal whitespace before the closing quote + - This horizontal whitespace is the *closing indentation*. - The closing indentation is removed. -1. The character immediately before the closing line must be a literal newline character. - - This newline is removed. +1. The character immediately before the closing line must be a literal EOL. + - This EOL is removed. 1. The *common indentation* is calculated. - It is the largest amount of leading whitespace shared by all non-empty lines. + It is the largest amount of leading horizontal whitespace shared by all non-empty lines. -1. For each line, remove the smallest amount of leading whitespace that satisfies: +1. For each line, remove the smallest amount of leading horizontal whitespace that satisfies: - `min(common indentation, closing indentation)` @@ -813,8 +825,8 @@ An empty line only consists of literal spaces, literal horizontal tabs or litera #### On the line containing the closing quote -- Only whitespace is allowed before the closing quote. -- Escapes are not permitted even if they are escapes for whitespace (e.g. a tab escape `\t`), because escapes are processed after dedenting, so they are not yet whitespace when the line with the closing quote is processed. +- Only horizontal whitespace is allowed before the closing quote. +- Escapes are not permitted even if they are escapes for horizontal whitespace (e.g. a tab escape `\t`), because escapes are processed after dedenting, so they are not yet horizontal whitespace when the line with the closing quote is processed. #### In the content of the string @@ -842,7 +854,7 @@ Prints, with each indentation being **1 tab**: The indentation is not removed, because common indentation in this example is 0. (closing indentation is 1 tab). -Escaped characters at the beginning of the string are interpreted as any other character, and **not** whitespace. +Escaped characters at the beginning of the string are interpreted as any other character, and **not** horizontal whitespace. After the dedentation is calculated, the escapes then expand into their literal counterparts. @@ -864,7 +876,7 @@ assert_eq!( •• ••••world ", -^^^^ // common leading whitespace (will be removed) +^^^^ // common leading indentation (will be removed) "hello\n\nworld" ); @@ -938,6 +950,41 @@ let _ = d" "; ```` +## Treatment of special unicode characters + +The invisible whitespace characters `U+200E` (left-to-right mark) and `U+200F` (right-to-left mark) cannot appear anywhere inside the indentation to be stripped from a line. + +When the compiler encounters these characters, it offers to place them directly *after* the stripped indentation. + +Invalid example, `◀` represents `U+200F` and `▶` represents `U+200E`: + +```rust +let py = d" + ◀ def hello(): + ▶ print('Hello, world!') + + hello()\n + "; +//^^ error: U+200E cannot appear in the stripped indentation +// help: place them after the stripped indentation +//^^ error: U+200F cannot appear in the leading indentation +// help: place them after the stripped indentation +``` + +It should be fixed as follows: + +```rust +let py = d" + ◀def hello(): + ▶print('Hello, world!') + + hello()\n + "; +// OK +``` + +The above example is valid because the invisible characters `U+200F` and `U+200E` after the indentation which will be remain in the output, while the indentation of 4 spaces will be stripped from each line. + # Drawbacks [drawbacks]: #drawbacks @@ -973,11 +1020,11 @@ When picking a single letter for this feature, we want: The RFC picks `d` as a mnemonic for "dedent". -- Dedentation is a simple atomic operation which removes the leading indentation of the string +- Dedentation is a simple atomic operation which removes the indentation of the string - The transformation is always a dedentation - If there is no leading indentation, removing the it is still accurately described as a "dedentation" because the nothing is removed. -- It might help make the acronym more memorable by thinking about the `d` as "**d**eleting" the leading indentation. + If there is no indentation, removing the it is still accurately described as a "dedentation" because the nothing is removed. +- It might help make the acronym more memorable by thinking about the `d` as "**d**eleting" the indentation. #### Why not `u` for "unindent" @@ -1051,7 +1098,7 @@ create table student( ) ``` -When the string is evaluated, the leading indentation is removed. It is **dedented**. +When the string is evaluated, the indentation is removed. It is **dedented**. In the source code, the string is **indented**. @@ -1128,7 +1175,7 @@ The first example reads in the most natural manner. The other two don't. But since this is already in the language, we can't change it. --> -### Requirement of first and final newline +### Requirement of first and final EOL As mentioned earlier in the RFC: @@ -1153,9 +1200,9 @@ fn main() { ``` - The `d"` and `create` in the first `d"create` not being separated by whitespace makes it harder to understand where the code begins. They have to be mentally separated. -- Additionally, indentation of the `create` does not align with what it will look like in the output, making it less obvious, which we would like to aviod. Therefore it is a **hard error** to not have a literal newline there. +- Additionally, indentation of the `create` does not align with what it will look like in the output, making it less obvious, which we would like to aviod. Therefore it is a **hard error** to not have a literal EOL. -The following is also incorrect, as there is no newline before the line containing the closing quote: +The following is also incorrect, as there is no EOL before the line containing the closing quote: ```rs fn main() { @@ -1173,9 +1220,9 @@ fn main() { - In the example above, it is not immediately clear where that would be from. - It easy to modify the common indentation level of the string in the future, as you do not have to create a new line. -### The choice of not ending with a newline +### The choice of not ending with an EOL -Dedented string literals do not end with a newline. +Dedented string literals do not end with an EOL. The following: @@ -1201,7 +1248,7 @@ create table student( ) ``` -In order to add a final newline, an extra blank line needs to be added at the end: +In order to add a final newline, insert a newline (literal "\n" or escaped `\n`) (or any EOL) at the end: ```rs fn main() { @@ -1217,24 +1264,24 @@ fn main() { } ``` -Removing the final newline is consistent with removing the initial newline. +Removing the final EOL is consistent with removing the initial EOL. The line containing the opening quote `"` and the line containing the closing quote `"` can be considered to be fully exempt from the output. If this *wasn't* the behaviour: -- It would make less sense to remove the newline from the beginning, but not from the end. -- Dedented strings would always end with a newline -- ..But how do you opt-out of the newline? +- It would make less sense to remove the EOL from the beginning, but not from the end. +- Dedented strings would always end with a EOL +- ..But how do you opt-out of the EOL? Using a special syntax, like closing with a `-"` (as a different RFC proposes) would be too special-cased, it wouldn't fit in with the rest of the language. It would be confusing for those that want to end the dedented string with a `-`. -Removing *both* the newline at the start and the end is consistent, and allows maximum flexibility whilst not making additional trade-offs such as having to introduce new special syntax to exclude the newline. +Removing *both* the EOL at the start and the end is consistent, and allows maximum flexibility whilst not making additional trade-offs such as having to introduce new special syntax to exclude the EOL. ### Allowing the closing line to be indented more than previous lines -Having the quote be indented further than the first non-whitespace character in the +Having the quote be indented further than the first non-horizontal-whitespace character in the string content is allowed: ```rs @@ -1295,7 +1342,7 @@ Differences: let sql = #[editor::language("sql")] "SELECT * FROM table;"; ``` -- RFC #3450 makes the "code strings" always end with a newline, with the ability to prepend a minus before the closing quote in order to remove the final newline. +- RFC #3450 makes the "code strings" always end with an EOL, with the ability to prepend a minus before the closing quote in order to remove the final EOL. However, in this RFC the following: @@ -1304,7 +1351,7 @@ Differences: d" a " - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) ); ``` @@ -1320,7 +1367,7 @@ Differences: a " - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) ); ``` @@ -1389,7 +1436,7 @@ The dedent macros will be possible to replace using the dedented string literals GET {url} Accept: {mime} ", - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) url = "http://localhost:8080", mime = "application/json", } @@ -1403,7 +1450,7 @@ The dedent macros will be possible to replace using the dedented string literals GET {url} Accept: {mime} ", - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) url = "http://localhost:8080", mime = "application/json", } @@ -1416,7 +1463,7 @@ The dedent macros will be possible to replace using the dedented string literals Accept: application/json ``` - Note that `eprintdoc!` does not remove the final line, that's why we use `eprintln` instead of `eprint`. + Note that `eprintdoc!` does not remove the final EOL, that's why we use `eprintln` instead of `eprint`. - `indoc!`: Dedents the passed string. @@ -1429,7 +1476,7 @@ The dedent macros will be possible to replace using the dedented string literals hello() "#} - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) ``` With dedented string literals: @@ -1442,7 +1489,7 @@ The dedent macros will be possible to replace using the dedented string literals hello() "# - ^^^^ // common leading whitespace (will be removed) + ^^^^ // common leading indentation (will be removed) ``` Both snippets evaluate to: @@ -1454,7 +1501,7 @@ The dedent macros will be possible to replace using the dedented string literals hello() ``` - Note that `indoc!` does not remove the final line, that's why we add an additional newline after `hello()`. + Note that `indoc!` does not remove the final EOL, that's why we add an additional newline after `hello()`. As a bonus, not only does it unify many macros under a single language feature. @@ -1490,7 +1537,7 @@ text!( GET {url} Accept: {mime} " -^^^^ // common leading whitespace (will be removed) +^^^^ // common leading indentation (will be removed) ) ``` @@ -1611,7 +1658,7 @@ writeln!( The above conversion is elegant for these reasons: - It is a simple modification by prepending `d` before the string literal -- All of the escaped sequences are removed, the whitespace removal is taken care of by the dedented string literal +- All of the escaped sequences are removed, the indentation removal is taken care of by the dedented string literal - Since we can now use a raw string, we no longer have to escape the quotes - Notably: **All of the interpolated variables continue to work as before**. @@ -1845,7 +1892,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) ); } ``` @@ -1861,7 +1908,7 @@ fn main() { name text ) " -^^^^^^^^ // common leading whitespace (will be removed) +^^^^^^^^ // common leading indentation (will be removed) ); } ``` @@ -1876,6 +1923,6 @@ There could be a lint which detects strings which could be written clearer as de ## `rustc` warn-by-default lint to disallow whitespace escape characters -As explained in the [reference level explanation](#reference-level-explanation), using whitespace escapes `\t`, `\n` and `\r` are allowed. +As explained in the [reference level explanation](#reference-level-explanation), using escapes `\t`, `\n` and `\r` is allowed. Their behaviour might be surprising, so it is worth to consider a warn-by-default lint for them. From d18a31a4b850c34368a202fc1be65dd77496cc47 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 19:05:17 +0100 Subject: [PATCH 39/43] Allow horizontal whitespace characters between the opening quote and closing EOL --- text/3830-dedented-string-literals.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 5089542faef..7a851c79ea4 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -800,6 +800,8 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` An empty line only consists of literal whitespace (*not* escaped whitespace such as `\n`) 1. The opening line (the line containing the opening quote `"`) + - *May* contain 1 or more horizontal whitespace characters. (*trailing* horizontal whitespace) + - These horizontal whitespace characters are removed. - Must only contain a literal EOL after the `"` token - This EOL is removed. 1. The closing line (the line containing the closing quote `"`) @@ -1886,7 +1888,7 @@ If indentation of the dedented string does not match the surrounding code: ```rust fn main() { println!( - d" + d" // 4 trailing spaces here create table student( id int primary key, name text @@ -1902,7 +1904,7 @@ It could be automatically formatted by adding additional leading indentation, in ```rust fn main() { println!( - d" + d" // 0 trailing spaces here (stripped) create table student( id int primary key, name text From 0a8c747bb14cb8a4ec48e88c2bcf48b3ef026c48 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 19:25:30 +0100 Subject: [PATCH 40/43] Disallow mixing spaces and tabs --- text/3830-dedented-string-literals.md | 66 ++++++++++++++++++++++----- 1 file changed, 54 insertions(+), 12 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 7a851c79ea4..783b21ec4dc 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -771,6 +771,7 @@ We use these terms throughout the RFC and they are explained in detail in this s - **Horizontal whitespace** is spaces or tabs. These are the only `Pattern_White_Space` characters that are *horizontal* space per [UAX#31](https://www.unicode.org/reports/tr31/#Contexts_for_Ignorable_Format_Controls) - **EOL (end-of-line) character** is any "end of line" character as classified in [`UAX#R3a-1`](https://www.unicode.org/reports/tr31/#R3a-1) - **Indentation** is one or more **horizontal whitespace** at the beginning of a line +- An **empty line** only consists of literal horizontal whitespace A "newline" is used as an example of a specific EOL character, however any other valid EOL character can be used. @@ -797,8 +798,6 @@ Note: **Literal newlines** (*not* escaped newlines: `\n`) are represented with ` ## Algorithm for dedented strings -An empty line only consists of literal whitespace (*not* escaped whitespace such as `\n`) - 1. The opening line (the line containing the opening quote `"`) - *May* contain 1 or more horizontal whitespace characters. (*trailing* horizontal whitespace) - These horizontal whitespace characters are removed. @@ -987,6 +986,58 @@ let py = d" The above example is valid because the invisible characters `U+200F` and `U+200E` after the indentation which will be remain in the output, while the indentation of 4 spaces will be stripped from each line. +## Mixed spaces and tabs + +In all examples of this RFC, we only assume that the common indentation of each line (to be stripped) and indentation of the closing quote of the dedented string uses the same character (either literal tabs, or literal spaces) + +Mixing these character in a way that is ambiguous is disallowed, and will error. For instance, in the following example with literal tabs (represented by `→`) and literal spaces (represented by `•`) mixed together: + +```rust +// error: ambiguous spaces mixed with tabs +let py = d" +→→→→def hello(): +→→→→••••print('Hello, world!') + +•→••hello() +→→••"; +``` + +The above program is rejected due to ambiguity. The leading indentation must pick a single character. + +Choose either **only spaces**: + +```rust +let py = d" +••••def hello(): +••••••••print('Hello, world!') + +••••hello() +••••"; +``` + +Or **only tabs**: + +```rust +let py = d" +→→→→def hello(): +→→→→→→→→print('Hello, world!') + +→→→→hello() +→→→→"; +``` + +Both of the above valid examples would be the same as: + +```rust +let py = "\ +def hello(): +→→→→print('Hello, world!') + +hello()"; +``` + +Empty lines can safely be mixed with either spaces or tabs, as they do not count for the purposes of dedentation + # Drawbacks [drawbacks]: #drawbacks @@ -1857,16 +1908,7 @@ In the Rust ecosystem: # Unresolved questions [unresolved-questions]: #unresolved-questions -What should happen if we have tabs (represented by `→`) and literal spaces (represented by `•`) mixed together? - -```rust -let py = d" -→→→→def hello(): -→→→→••••print('Hello, world!') - -•→••hello() -→→••"; -``` +None # Future possibilities [future-possibilities]: #future-possibilities From 371910dfa15d178841ed5ff1fa49880e14d470f2 Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 21:13:28 +0100 Subject: [PATCH 41/43] fix: indent only with spaces can't give you tabs Co-authored-by: Jacob Lifshay --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index 783b21ec4dc..b02420f2a7f 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1031,7 +1031,7 @@ Both of the above valid examples would be the same as: ```rust let py = "\ def hello(): -→→→→print('Hello, world!') +••••print('Hello, world!') hello()"; ``` From a3f9a195784c0ed2a9587278ae2ee52ba48f47cc Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 21:13:45 +0100 Subject: [PATCH 42/43] fix: replace tabs with spaces Co-authored-by: Jacob Lifshay --- text/3830-dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index b02420f2a7f..c2f3c378a3c 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1020,7 +1020,7 @@ Or **only tabs**: ```rust let py = d" →→→→def hello(): -→→→→→→→→print('Hello, world!') +→→→→••••print('Hello, world!') →→→→hello() →→→→"; From 2eb90a1049fdafe667499f0792392056df84d7ca Mon Sep 17 00:00:00 2001 From: Nik Revenco Date: Fri, 13 Jun 2025 21:22:41 +0100 Subject: [PATCH 43/43] Allow mixing spaces and tabs* *As long as the level of each indentation is consistent --- text/3830-dedented-string-literals.md | 33 ++++++++++----------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/text/3830-dedented-string-literals.md b/text/3830-dedented-string-literals.md index c2f3c378a3c..aa61cc4fa62 100644 --- a/text/3830-dedented-string-literals.md +++ b/text/3830-dedented-string-literals.md @@ -1002,41 +1002,32 @@ let py = d" →→••"; ``` -The above program is rejected due to ambiguity. The leading indentation must pick a single character. +The above program is rejected due to ambiguity. There is no single "common indentation" that is the same on each line. -Choose either **only spaces**: +Mixing spaces and tabs in a way such that the common indentation matches, *even if* the indentation consists of both spaces and tabs is allowed: ```rust let py = d" -••••def hello(): -••••••••print('Hello, world!') +→••→•def hello(): +→••→•••••print('Hello, world!') -••••hello() -••••"; +→••→hello() +→••→"; ``` -Or **only tabs**: - -```rust -let py = d" -→→→→def hello(): -→→→→••••print('Hello, world!') - -→→→→hello() -→→→→"; -``` - -Both of the above valid examples would be the same as: +The above is equivalent to: ```rust let py = "\ -def hello(): -••••print('Hello, world!') +•def hello(): +•••••print('Hello, world!') hello()"; ``` -Empty lines can safely be mixed with either spaces or tabs, as they do not count for the purposes of dedentation +Common indentation is `→••→`, which is stripped from each line. + +Empty lines can safely be mixed with either spaces or tabs, as they do not count for the purposes of dedentation. # Drawbacks [drawbacks]: #drawbacks