From 816158ac67dc6dc338764aecd8033303bba6a401 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 29 Sep 2025 02:11:02 -0700 Subject: [PATCH] Clarify proc-macro token tree This attempts to clarify some details about proc-macro tokens, in particular the differences in handling `_` from `macro_rules`. This includes several changes: - Removed the wordy description of a `macro_rules` TokenTree. This is already defined in the grammar in more detail. This does remove the sentence discussing metavariable substitutions. I would prefer to defer this to a more detailed description in the `macro_rules` chapter on how metavariable substitutions work. - Removed the English description of a proc-macro token tree, and instead write it as a grammar production. Closes https://github.com/rust-lang/reference/issues/2020 Closes https://github.com/rust-lang/reference/issues/1236 --- src/procedural-macros.md | 96 ++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 32 deletions(-) diff --git a/src/procedural-macros.md b/src/procedural-macros.md index a603d5ac50..af91e4033e 100644 --- a/src/procedural-macros.md +++ b/src/procedural-macros.md @@ -318,45 +318,75 @@ r[macro.proc.token] ## Declarative macro tokens and procedural macro tokens r[macro.proc.token.intro] -Declarative `macro_rules` macros and procedural macros use similar, but -different definitions for tokens (or rather [`TokenTree`s].) - -r[macro.proc.token.macro_rules] -Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as -- Delimited groups (`(...)`, `{...}`, etc) -- All operators supported by the language, both single-character and - multi-character ones (`+`, `+=`). - - Note that this set doesn't include the single quote `'`. -- Literals (`"string"`, `1`, etc) - - Note that negation (e.g. `-1`) is never a part of such literal tokens, - but a separate operator token. -- Identifiers, including keywords (`ident`, `r#ident`, `fn`) -- Lifetimes (`'ident`) -- Metavariable substitutions in `macro_rules` (e.g. `$my_expr` in - `macro_rules! mac { ($my_expr: expr) => { $my_expr } }` after the `mac`'s - expansion, which will be considered a single token tree regardless of the - passed expression) +Declarative `macro_rules` macros and procedural macros use similar but different definitions for tokens. Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as the [TokenTree] production. Token trees for procedural macros are described below. r[macro.proc.token.tree] -Token trees in procedural macros are defined as -- Delimited groups (`(...)`, `{...}`, etc) -- All punctuation characters used in operators supported by the language (`+`, - but not `+=`), and also the single quote `'` character (typically used in - lifetimes, see below for lifetime splitting and joining behavior) -- Literals (`"string"`, `1`, etc) - - Negation (e.g. `-1`) is supported as a part of integer - and floating point literals. -- Identifiers, including keywords (`ident`, `r#ident`, `fn`) +Token trees in procedural macros are defined as: + +```grammar,macros +@root ProcMacroTokenTree -> ProcMacroToken | ProcMacroDelimTokenTree + +ProcMacroDelimTokenTree -> + `(` ProcMacroTokenTree* `)` + | `[` ProcMacroTokenTree* `]` + | `{` ProcMacroTokenTree* `}` + +ProcMacroToken -> + ProcMacroTokenIdent + | ProcMacroTokenPunct + | ProcMacroTokenLiteral + +ProcMacroTokenIdent -> IDENTIFIER_OR_KEYWORD | RAW_IDENTIFIER | `_` + +ProcMacroTokenPunct -> + `=` + | `<` + | `>` + | `!` + | `~` + | `+` + | `-` + | `*` + | `/` + | `%` + | `^` + | `&` + | `|` + | `@` + | `.` + | `,` + | `;` + | `:` + | `#` + | `$` + | `?` + | `'` + +ProcMacroTokenLiteral -> + CHAR_LITERAL + | STRING_LITERAL + | RAW_STRING_LITERAL + | BYTE_LITERAL + | BYTE_STRING_LITERAL + | RAW_BYTE_STRING_LITERAL + | C_STRING_LITERAL + | RAW_C_STRING_LITERAL + | INTEGER_LITERAL + | FLOAT_LITERAL + | `-` INTEGER_LITERAL + | `-` FLOAT_LITERAL +``` r[macro.proc.token.conversion.intro] -Mismatches between these two definitions are accounted for when token streams -are passed to and from procedural macros. \ +Mismatches between the `macro_rules` and proc-macro tokens are accounted for when token streams are passed to and from procedural macros. \ Note that the conversions below may happen lazily, so they might not happen if the tokens are not actually inspected. r[macro.proc.token.conversion.to-proc_macro] -When passed to a proc-macro -- All multi-character operators are broken into single characters. +When tokens are passed to a proc-macro: + +- All multi-character punctuation is broken into single characters. +- `_` is treated as an identifier. - Lifetimes are broken into a `'` character and an identifier. - The keyword metavariable [`$crate`] is passed as a single identifier. - All other metavariable substitutions are represented as their underlying @@ -368,13 +398,15 @@ When passed to a proc-macro always represented as their underlying token trees. r[macro.proc.token.conversion.from-proc_macro] -When emitted from a proc macro +When tokens are emitted from a proc macro: + - Punctuation characters are glued into multi-character operators when applicable. - Single quotes `'` joined with identifiers are glued into lifetimes. - Negative literals are converted into two tokens (the `-` and the literal) possibly wrapped into a delimited group ([`Group`]) with implicit delimiters ([`Delimiter::None`]) when it's necessary for preserving parsing priorities. +- The `_` identifier is interpreted as the `_` punctuation. r[macro.proc.token.doc-comment] Note that neither declarative nor procedural macros support doc comment tokens