Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 64 additions & 32 deletions src/procedural-macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,45 +318,75 @@ r[macro.proc.token]
## Declarative macro tokens and procedural macro tokens

r[macro.proc.token.intro]
Declarative `macro_rules` macros and procedural macros use similar, but
different definitions for tokens (or rather [`TokenTree`s].)

r[macro.proc.token.macro_rules]
Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as
- Delimited groups (`(...)`, `{...}`, etc)
- All operators supported by the language, both single-character and
multi-character ones (`+`, `+=`).
- Note that this set doesn't include the single quote `'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the English text to highlight the differences between macro_rules and proc macro tokens, like multi-character operators (which are not a thing in proc macros), lifetimes, negative literals.
From the formal grammar the difference is very hard to see.

- Literals (`"string"`, `1`, etc)
- Note that negation (e.g. `-1`) is never a part of such literal tokens,
but a separate operator token.
- Identifiers, including keywords (`ident`, `r#ident`, `fn`)
- Lifetimes (`'ident`)
- Metavariable substitutions in `macro_rules` (e.g. `$my_expr` in
`macro_rules! mac { ($my_expr: expr) => { $my_expr } }` after the `mac`'s
expansion, which will be considered a single token tree regardless of the
passed expression)
Declarative `macro_rules` macros and procedural macros use similar but different definitions for tokens. Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as the [TokenTree] production. Token trees for procedural macros are described below.

r[macro.proc.token.tree]
Token trees in procedural macros are defined as
- Delimited groups (`(...)`, `{...}`, etc)
- All punctuation characters used in operators supported by the language (`+`,
but not `+=`), and also the single quote `'` character (typically used in
lifetimes, see below for lifetime splitting and joining behavior)
- Literals (`"string"`, `1`, etc)
- Negation (e.g. `-1`) is supported as a part of integer
and floating point literals.
- Identifiers, including keywords (`ident`, `r#ident`, `fn`)
Token trees in procedural macros are defined as:

```grammar,macros
@root ProcMacroTokenTree -> ProcMacroToken | ProcMacroDelimTokenTree

ProcMacroDelimTokenTree ->
`(` ProcMacroTokenTree* `)`
| `[` ProcMacroTokenTree* `]`
| `{` ProcMacroTokenTree* `}`

ProcMacroToken ->
ProcMacroTokenIdent
| ProcMacroTokenPunct
| ProcMacroTokenLiteral

ProcMacroTokenIdent -> IDENTIFIER_OR_KEYWORD | RAW_IDENTIFIER | `_`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't reference consider _ a keyword?
I think it should, it's lexically an identifier, and it's reserved, so it's a keyword (== reserved identifier).


ProcMacroTokenPunct ->
`=`
| `<`
| `>`
| `!`
| `~`
| `+`
| `-`
| `*`
| `/`
| `%`
| `^`
| `&`
| `|`
| `@`
| `.`
| `,`
| `;`
| `:`
| `#`
| `$`
| `?`
| `'`

ProcMacroTokenLiteral ->
CHAR_LITERAL
| STRING_LITERAL
| RAW_STRING_LITERAL
| BYTE_LITERAL
| BYTE_STRING_LITERAL
| RAW_BYTE_STRING_LITERAL
| C_STRING_LITERAL
| RAW_C_STRING_LITERAL
| INTEGER_LITERAL
| FLOAT_LITERAL
| `-` INTEGER_LITERAL
| `-` FLOAT_LITERAL
```

r[macro.proc.token.conversion.intro]
Mismatches between these two definitions are accounted for when token streams
are passed to and from procedural macros. \
Mismatches between the `macro_rules` and proc-macro tokens are accounted for when token streams are passed to and from procedural macros. \
Note that the conversions below may happen lazily, so they might not happen if
the tokens are not actually inspected.

r[macro.proc.token.conversion.to-proc_macro]
When passed to a proc-macro
- All multi-character operators are broken into single characters.
When tokens are passed to a proc-macro:

- All multi-character punctuation is broken into single characters.
- `_` is treated as an identifier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading, _ is a (reserved) identifier.

- Lifetimes are broken into a `'` character and an identifier.
- The keyword metavariable [`$crate`] is passed as a single identifier.
- All other metavariable substitutions are represented as their underlying
Expand All @@ -368,13 +398,15 @@ When passed to a proc-macro
always represented as their underlying token trees.

r[macro.proc.token.conversion.from-proc_macro]
When emitted from a proc macro
When tokens are emitted from a proc macro:

- Punctuation characters are glued into multi-character operators
when applicable.
- Single quotes `'` joined with identifiers are glued into lifetimes.
- Negative literals are converted into two tokens (the `-` and the literal)
possibly wrapped into a delimited group ([`Group`]) with implicit delimiters
([`Delimiter::None`]) when it's necessary for preserving parsing priorities.
- The `_` identifier is interpreted as the `_` punctuation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the reference should ever call _ punctuation, it's a (reserved) identifier in all senses except one - ident matcher in macro_rules doesn't accept it for backward compatibility reasons (but it can likely be changed over an edition).


r[macro.proc.token.doc-comment]
Note that neither declarative nor procedural macros support doc comment tokens
Expand Down