Skip to content

Commit 8bbe1ff

Browse files
committed
start working through discussion
1 parent ee0059a commit 8bbe1ff

File tree

1 file changed

+51
-198
lines changed

1 file changed

+51
-198
lines changed

src/macro-expansion.md

Lines changed: 51 additions & 198 deletions
Original file line numberDiff line numberDiff line change
@@ -229,136 +229,69 @@ only within the macro (i.e. it should not be visible outside the macro).
229229
230230
This section is about how that context is tracked.
231231
232-
233232
[code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe
234233
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
235234
[code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules
236235
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html
237236
[parsing]: ./the-parser.html
238237
238+
## Notes from petrochenkov discussion
239+
240+
Where to find the code:
241+
- librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context)
242+
- librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs
243+
- librustc_builtin_macros - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness.
244+
- librustc_ast/config.rs - implementation of cfg/cfg_attr (they treated specially from other macros), should probably be moved into librustc_ast/ext.
245+
- librustc_ast/tokenstream.rs + librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, and token streams.
246+
- librustc_ast/ext - various expansion-related stuff
247+
- librustc_ast/ext/base.rs - basic structures used by expansion
248+
- librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion infrastructure code - collecting macro invocations, calling into resolve for them, calling their expanding functions, and integrating the results back into AST
249+
- librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes
250+
- librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably be moved into librustc_builtin_macros these days
251+
- librustc_ast/ext/proc_macro.rs + librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI
252+
- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this
253+
- librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors
254+
- librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths"
255+
256+
Primary structures:
257+
- HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context
258+
- ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context)
259+
- ExpnInfo/InternalExpnData - a subset of properties from both macro definition and macro call available through global data
260+
- SyntaxContext - ID of a chain of nested macro definitions (identified by ExpnIds)
261+
- SyntaxContextData - data associated with the given SyntaxContext, mostly a cache for results of filtering that chain in different ways
262+
- Span - a code location + SyntaxContext
263+
- Ident - interned string (Symbol) + Span, i.e. a string with attached hygiene data
264+
- TokenStream - a collection of TokenTrees
265+
- TokenTree - a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{})
266+
- SyntaxExtension - a lowered macro representation, contains its expander function transforming a tokenstream or AST into tokenstream or AST + some additional data like stability, or a list of unstable features allowed inside the macro.
267+
- SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc), this is an enum that lists them
268+
- ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent)
269+
- Resolver - a trait used to break crate dependencies (so resolver services can be used in librustc_ast, despite librustc_resolve and pretty much everything else depending on librustc_ast)
270+
- ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infra in the process of its work
271+
- AstFragment - a piece of AST that can be produced by a macro (may include multiple homogeneous AST nodes, like e.g. a list of items)
272+
- Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment)
273+
- MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.)
274+
- Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc.
275+
276+
TODO: how a crate transitions from the state "macros exist as written in source" to "all macros are expanded"
277+
278+
Expansion Heirarchies and Syntax Context
279+
- Many AST nodes have some sort of syntax context, especially nodes from macros. The context consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node.
280+
- There are 3 expansion heirarchies
281+
- They all start at ExpnId::root, which is its own parent
282+
283+
284+
285+
286+
287+
239288
240289
# Discussion about hygiene
241290
242-
The rest of this chapter is a dump of a discussion between `mark-i-m` and
243-
`petrochenkov` about Macro Expansion and Hygiene. I am pasting it here so that
244-
it never gets lost until we can make it into a proper chapter.
245291
246292
```txt
247293
248-
Vadim Petrochenkov: Here's some preliminary data I prepared.
249-
250-
Vadim Petrochenkov: Below I'll assume #62771 and #62086 has landed.
251-
252-
Vadim Petrochenkov: Where to find the code: librustc_span/hygiene.rs -
253-
structures related to hygiene and expansion that are kept in global data (can
254-
be accessed from any Ident without any context) librustc_span/lib.rs - some
255-
secondary methods like macro backtrace using primary methods from hygiene.rs
256-
librustc_builtin_macros - implementations of built-in macros (including macro attributes
257-
and derives) and some other early code generation facilities like injection of
258-
standard library imports or generation of test harness. librustc_ast/config.rs -
259-
implementation of cfg/cfg_attr (they treated specially from other macros),
260-
should probably be moved into librustc_ast/ext. librustc_ast/tokenstream.rs +
261-
librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees,
262-
and token streams. librustc_ast/ext - various expansion-related stuff
263-
librustc_ast/ext/base.rs - basic structures used by expansion
264-
librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion
265-
infrastructure code - collecting macro invocations, calling into resolve for
266-
them, calling their expanding functions, and integrating the results back into
267-
AST librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for
268-
"integrating the results back into AST" basicallly, "placeholder" is a
269-
temporary AST node replaced with macro expansion result nodes
270-
librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros
271-
in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably
272-
be moved into librustc_builtin_macros these days librustc_ast/ext/proc_macro.rs +
273-
librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the
274-
stable proc_macro library, converting tokens and token streams between the two
275-
representations and sending them through C ABI librustc_ast/ext/tt -
276-
implementation of macro_rules, turns macro_rules DSL into something with
277-
signature Fn(TokenStream) -> TokenStream that can eat and produce tokens,
278-
@mark-i-m knows more about this librustc_resolve/macros.rs - resolving macro
279-
paths, validating those resolutions, reporting various "not found"/"found, but
280-
it's unstable"/"expected x, found y" errors librustc_middle/hir/map/def_collector.rs +
281-
librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly
282-
expanded from a macro into various parent/child structures like module
283-
hierarchy or "definition paths"
284-
285-
Primary structures: HygieneData - global piece of data containing hygiene and
286-
expansion info that can be accessed from any Ident without any context ExpnId -
287-
ID of a macro call or desugaring (and also expansion of that call/desugaring,
288-
depending on context) ExpnInfo/InternalExpnData - a subset of properties from
289-
both macro definition and macro call available through global data
290-
SyntaxContext - ID of a chain of nested macro definitions (identified by
291-
ExpnIds) SyntaxContextData - data associated with the given SyntaxContext,
292-
mostly a cache for results of filtering that chain in different ways Span - a
293-
code location + SyntaxContext Ident - interned string (Symbol) + Span, i.e. a
294-
string with attached hygiene data TokenStream - a collection of TokenTrees
295-
TokenTree - a token (punctuation, identifier, or literal) or a delimited group
296-
(anything inside ()/[]/{}) SyntaxExtension - a lowered macro representation,
297-
contains its expander function transforming a tokenstream or AST into
298-
tokenstream or AST + some additional data like stability, or a list of unstable
299-
features allowed inside the macro. SyntaxExtensionKind - expander functions
300-
may have several different signatures (take one token stream, or two, or a
301-
piece of AST, etc), this is an enum that lists them
302-
ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing
303-
the expander signatures (TODO: change and rename the signatures into something
304-
more consistent) trait Resolver - a trait used to break crate dependencies (so
305-
resolver services can be used in librustc_ast, despite librustc_resolve and pretty
306-
much everything else depending on librustc_ast) ExtCtxt/ExpansionData - various
307-
intermediate data kept and used by expansion infra in the process of its work
308-
AstFragment - a piece of AST that can be produced by a macro (may include
309-
multiple homogeneous AST nodes, like e.g. a list of items) Annotatable - a
310-
piece of AST that can be an attribute target, almost same thing as AstFragment
311-
except for types and patterns that can be produced by macros but cannot be
312-
annotated with attributes (TODO: Merge into AstFragment) trait MacResult - a
313-
"polymorphic" AST fragment, something that can turn into a different
314-
AstFragment depending on its context (aka AstFragmentKind - item, or
315-
expression, or pattern etc.) Invocation/InvocationKind - a structure describing
316-
a macro call, these structures are collected by the expansion infra
317-
(InvocationCollector), queued, resolved, expanded when resolved, etc.
318-
319-
Primary algorithms / actions: TODO
320-
321-
mark-i-m: Very useful :+1:
322-
323-
mark-i-m: @Vadim Petrochenkov Zulip doesn't have an indication of typing, so
324-
I'm not sure if you are waiting for me or not
325-
326-
Vadim Petrochenkov: The TODO part should be about how a crate transitions from
327-
the state "macros exist as written in source" to "all macros are expanded", but
328-
I didn't write it yet.
329-
330-
Vadim Petrochenkov: (That should probably better happen off-line.)
331-
332-
Vadim Petrochenkov: Now, if you have any questions?
333-
334-
mark-i-m: Thanks :)
335-
336-
mark-i-m: /me is still reading :P
337-
338-
mark-i-m: Ok
339-
340-
mark-i-m: So I guess my first question is about hygiene, since that remains the
341-
most mysterious to me... My understanding is that the parser outputs AST nodes,
342-
where each node has a Span
343-
344-
mark-i-m: In the absence of macros and desugaring, what does the syntax context
345-
of an AST node look like?
346-
347-
mark-i-m: @Vadim Petrochenkov
348-
349-
Vadim Petrochenkov: Not each node, but many of them. When a node is not
350-
macro-expanded, its context is 0.
351-
352-
Vadim Petrochenkov: aka SyntaxContext::empty()
353-
354-
Vadim Petrochenkov: it's a chain that consists of one expansion - expansion 0
355-
aka ExpnId::root.
356-
357-
mark-i-m: Do all expansions start at root?
358-
359-
Vadim Petrochenkov: Also, SyntaxContext:empty() is its own father.
360-
361-
mark-i-m: Is this actually stored somewhere or is it a logical value?
294+
362295
363296
Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start
364297
at ExpnId::root.
@@ -368,12 +301,8 @@ expn_id == 0.
368301
369302
Vadim Petrochenkov: I don't think anyone looks into them much though.
370303
371-
mark-i-m: Ok
372-
373304
Vadim Petrochenkov: Speaking of multiple hierarchies...
374305
375-
mark-i-m: Go ahead :)
376-
377306
Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ...
378307
379308
Vadim Petrochenkov: This is the order in which macros are expanded.
@@ -429,8 +358,6 @@ Sorry, what is outer_expns?
429358
430359
Vadim Petrochenkov: SyntaxContextData::outer_expn
431360
432-
mark-i-m: Thanks :) Please continue
433-
434361
Vadim Petrochenkov: ...which means a token produced by a built-in macro (which
435362
is defined in the root effectively).
436363
@@ -470,8 +397,6 @@ mark-i-m: I see, but this pattern is only used for built-ins, right?
470397
471398
Vadim Petrochenkov: And also all stable proc macros, see the comments above.
472399
473-
mark-i-m: Got it
474-
475400
Vadim Petrochenkov: The third hierarchy is call-site hierarchy.
476401
477402
Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident
@@ -507,30 +432,6 @@ generally.)
507432
508433
Vadim Petrochenkov: Yes.
509434
510-
mark-i-m: Got it :)
511-
512-
mark-i-m: It looks like we have ~5 minutes left. This has been very helpful
513-
already, but I also have more questions. Shall we try to schedule another
514-
meeting in the future?
515-
516-
Vadim Petrochenkov: Sure, why not.
517-
518-
Vadim Petrochenkov: A thread for offline questions-answers would be good too.
519-
520-
mark-i-m:
521-
522-
A thread for offline questions-answers would be good too.
523-
524-
I don't mind using this thread, since it already has a lot of info in it. We
525-
also plan to summarize the info from this thread into the rustc-dev-guide.
526-
527-
Sure, why not.
528-
529-
Unfortunately, I'm unavailable for a few weeks. Would August 21-ish work for
530-
you (and @WG-learning )?
531-
532-
mark-i-m: @Vadim Petrochenkov Thanks very much for your time and knowledge!
533-
534435
mark-i-m: One last question: are there more hierarchies?
535436
536437
Vadim Petrochenkov: Not that I know of. Three + the context transplantation
@@ -539,37 +440,8 @@ hack is already more complex than I'd like.
539440
mark-i-m: Yes, one wonders what it would be like if one also had to think about
540441
eager expansion...
541442
542-
Santiago Pastorino: sorry but I couldn't follow that much today, will read it
543-
when I have some time later
544-
545-
Santiago Pastorino: btw https://github.com/rust-lang/rustc-dev-guide/issues/398
546-
547-
mark-i-m: @Vadim Petrochenkov Would 7pm UTC on August 21 work for a followup?
548-
549-
Vadim Petrochenkov: Tentatively yes.
550-
551-
mark-i-m: @Vadim Petrochenkov @WG-learning Does this still work for everyone?
552-
553-
Vadim Petrochenkov: August 21 is still ok.
554-
555-
mark-i-m: @WG-learning @Vadim Petrochenkov We will start in ~30min
556-
557-
Vadim Petrochenkov: Oh. Thanks for the reminder, I forgot about this entirely.
558-
559-
mark-i-m: Hello!
560-
561-
Vadim Petrochenkov: (I'll be here in a couple of minutes.)
562-
563-
Vadim Petrochenkov: Ok, I'm here.
564-
565-
mark-i-m: Hi :)
566-
567-
Vadim Petrochenkov: Hi.
568-
569443
mark-i-m: so last time, we talked about the 3 context heirarchies
570444
571-
Vadim Petrochenkov: Right.
572-
573445
mark-i-m: Was there anything you wanted to add to that? If not, I think it
574446
would be good to get a big-picture... Given some piece of rust code, how do we
575447
get to the point where things are expanded and hygiene context is computed?
@@ -728,8 +600,6 @@ imports in that module.
728600
729601
For macro and import names this happens during expansions and integrations.
730602
731-
mark-i-m: Makes sense
732-
733603
Vadim Petrochenkov: For all other names we certainly know whether a name is
734604
resolved successfully or not on the first attempt, because no new names can
735605
appear.
@@ -791,21 +661,4 @@ m!(foo);
791661
Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT ->
792662
id(m) -> id(n) after all the expansions.
793663
794-
mark-i-m: Cool :)
795-
796-
mark-i-m: It looks like we are out of time
797-
798-
mark-i-m: Is there anything you wanted to add?
799-
800-
mark-i-m: We can schedule another meeting if you would like
801-
802-
Vadim Petrochenkov: Yep, 23.06 already. No, I think this is an ok point to
803-
stop.
804-
805-
mark-i-m: :+1:
806-
807-
mark-i-m: Thanks @Vadim Petrochenkov ! This was very helpful
808-
809-
Vadim Petrochenkov: Yeah, we can schedule another one. So far it's been like 1
810-
hour of meetings per month? Certainly not a big burden.
811664
```

0 commit comments

Comments
 (0)