AST for macro writers and lowering

There's some thought provoking discussion going at https://github.com/c42f/JuliaLowering.jl/pull/71 but I want to pull it some of it out into an issue rather than loose it in a PR.

@mlechu made a particularly insightful comment (https://github.com/c42f/JuliaLowering.jl/pull/71#issuecomment-3255121555)

> I think that some piece of code should be responsible for producing the most useful possible AST for macro writers. I also think we're at risk of failing to assign this responsibility to anything.

I think this is right and important to think about. I still feel a lot of weird design tension between `GreenNode` and `SyntaxNode`. `GreenNode` is required to be very close to the source and the parser. The only real choice we have in the green tree is which optional internal nodes are emitted. For example having `K"parens"` for grouping parentheses wasn't originally part of the tree at all. We have progressively gone in the direction of "all delimited syntactic structures should be in the green tree" thus we have things like `K"cmdstring"` to contain the backticks, `K"var"` for the syntax trivia around weird identifiers and `K"parens"` for grouping parentheses. At this point I think the design of the green tree is becoming relatively clear.

However it's never been very clear what `SyntaxNode` is for and what its design goals are. Originally I thought of it as a more convenient version of `GreenNode` - something with exactly the same structure but without the syntax trivia so that the indices of children are meaningful. But it's grown some other strange divergences from `GreenTree` such as not representing `K"parens"` nodes by default (grouping parentheses can occur almost anywhere and make working with the tree structure very difficult).

In JuliaLowering we have `SyntaxTree` which started out as a more flexible `SyntaxNode` and shares its tree structure. But changes like https://github.com/c42f/JuliaLowering.jl/pull/71 now make it clearer that the aims of `GreenNode` and `SyntaxTree` can be misaligned; there's some things which `GreenNode` should faithfully represent about the source (such as `K"macro_name"`) which should ideally be invisible to macros. In this case - if we'd like to deprecate the syntax `@A.x` for the alternative `A.@x` it would be ideal to abstract this away before macros can see it.

This is related to some design unease I've had with cleanups @Keno wants to make on the JuliaSyntax side. There's some good arguments for doing those cleanups and extra invariants about `RawGreenNode` are great to have. But on the other hand they add complexity to the tree API and exposes internals which macro writers shouldn't care about or have to deal with. (For example, deleting `remap_kind` and `bump_glue()` exposes internals of how tokenization ambiguities are resolved in the parser. I *really* just want to see `K"Identifier"` most of the time, and I don't want to have to deal with the fact that tokens like `K"outer"` are identifiers when they're nontrivia.  And the parser already knows this information, it seems very unfortunate to throw it out.)

With all this in mind, perhaps it's just time to allow the green tree and the AST used by macros to diverge further (and perhaps to delete `SyntaxNode` which is a weird middle ground).

### What is the AST for macros and lowering?

As a data structure the AST can be represented using `SyntaxTree` but let's talk about "the Julia AST" here, meaning "the tree structure we want macros and lowering to use".  (`SyntaxTree` is flexible enough as a data structure to hold the green tree - in fact we could replace the `GreenNode` data structure with a `SyntaxTree` and it'd probably be more compact.)

Some things which seem clear about the AST:

* We can't assume it's backed by a green tree or any source code because it could be programmatically generated, it could arise from `Expr` conversion, or the source could be stripped for minification/deployment.
* Provenance information should be present wherever possible as a link to the tree it was derived from or the source code.
* Not all information in the green tree is useful for macros.
* A given `head` of node defines the semantics of the children (currently this includes the flags)
* Having children strictly in source order is *greatly clarifying* in nearly all cases - there's is never a question about the possible order of children: it's just what you see every day reading the source.

What are some examples of differences and potential future differences between Julia AST and green tree?
* We exclude all of what we currently tag with `TRIVIA_FLAG`. This is necessary to make the indices of children meaningful at all.
* Some "syntactic container" nodes from the green tree make the AST harder to work with and aren't that useful. `K"char"` and `K"cmdstring"`, `K"parens"` and `K"var"` at least. (`K"string"` is necessary because string interpolations are handled by the parser.)
* `K"macro_name"` changes the semantics of one of the `K"Identifiers"` inside it and this nonlocality is awkward.
* `K"StrMacroName"` and `K"CmdMacroName"` are oddities.
* The ordering of the children of `K"call"` are troubling given infix vs prefix function calls. However, changing this would make it the one exception to the rule that children are source ordered and that seems like a huge invariant to give up.  (Can pattern matching rescue us from this problem?)

For several of these I think it's important to keep some of the information that they are present without necessarily having them as nodes in the tree structure. Generally, I feel that macros should be easily able to see "most things which are important visual cues" about the source, as long as
* That "visually important information" is consistent with the Julia parsing rules. (Counter example: indentation is visually important but never relevant to Julia parsing and macros shouldn't have easy access to that information.)
* The information can be represented in a way that keeps macros composable with each other
 
Some examples of information which makes things visually distinct and I feel should be in the AST:
* The presence of parentheses (in function calls, macro calls, tuples, etc).  `+(x,y)` is very visually different from `x + y` and the most compelling solutions to https://github.com/JuliaLang/julia/pull/24990 call for knowing about parentheses.  We already provide for this with a syntax flag.
* `@mac_str "blah"` is very visually distinct from `str"blah"` and it may be nice to have these distinguishable without it being intrusive on the tree structure. Likewise for `K"var"`.
* Some macro-implemented DSLs may want to know about the presence of `K"parens"` nodes. This could be provided with a nesting depth attribute (or flag if we considered only zero or one parens) instead of in the tree structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AST for macro writers and lowering #77

What is the AST for macros and lowering?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

AST for macro writers and lowering #77

Description

What is the AST for macros and lowering?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions