Skip to content

AST for macro writers and lowering #77

@c42f

Description

@c42f

There's some thought provoking discussion going at #71 but I want to pull it some of it out into an issue rather than loose it in a PR.

@mlechu made a particularly insightful comment (#71 (comment))

I think that some piece of code should be responsible for producing the most useful possible AST for macro writers. I also think we're at risk of failing to assign this responsibility to anything.

I think this is right and important to think about. I still feel a lot of weird design tension between GreenNode and SyntaxNode. GreenNode is required to be very close to the source and the parser. The only real choice we have in the green tree is which optional internal nodes are emitted. For example having K"parens" for grouping parentheses wasn't originally part of the tree at all. We have progressively gone in the direction of "all delimited syntactic structures should be in the green tree" thus we have things like K"cmdstring" to contain the backticks, K"var" for the syntax trivia around weird identifiers and K"parens" for grouping parentheses. At this point I think the design of the green tree is becoming relatively clear.

However it's never been very clear what SyntaxNode is for and what its design goals are. Originally I thought of it as a more convenient version of GreenNode - something with exactly the same structure but without the syntax trivia so that the indices of children are meaningful. But it's grown some other strange divergences from GreenTree such as not representing K"parens" nodes by default (grouping parentheses can occur almost anywhere and make working with the tree structure very difficult).

In JuliaLowering we have SyntaxTree which started out as a more flexible SyntaxNode and shares its tree structure. But changes like #71 now make it clearer that the aims of GreenNode and SyntaxTree can be misaligned; there's some things which GreenNode should faithfully represent about the source (such as K"macro_name") which should ideally be invisible to macros. In this case - if we'd like to deprecate the syntax @A.x for the alternative A.@x it would be ideal to abstract this away before macros can see it.

This is related to some design unease I've had with cleanups @Keno wants to make on the JuliaSyntax side. There's some good arguments for doing those cleanups and extra invariants about RawGreenNode are great to have. But on the other hand they add complexity to the tree API and exposes internals which macro writers shouldn't care about or have to deal with. (For example, deleting remap_kind and bump_glue() exposes internals of how tokenization ambiguities are resolved in the parser. I really just want to see K"Identifier" most of the time, and I don't want to have to deal with the fact that tokens like K"outer" are identifiers when they're nontrivia. And the parser already knows this information, it seems very unfortunate to throw it out.)

With all this in mind, perhaps it's just time to allow the green tree and the AST used by macros to diverge further (and perhaps to delete SyntaxNode which is a weird middle ground).

What is the AST for macros and lowering?

As a data structure the AST can be represented using SyntaxTree but let's talk about "the Julia AST" here, meaning "the tree structure we want macros and lowering to use". (SyntaxTree is flexible enough as a data structure to hold the green tree - in fact we could replace the GreenNode data structure with a SyntaxTree and it'd probably be more compact.)

Some things which seem clear about the AST:

  • We can't assume it's backed by a green tree or any source code because it could be programmatically generated, it could arise from Expr conversion, or the source could be stripped for minification/deployment.
  • Provenance information should be present wherever possible as a link to the tree it was derived from or the source code.
  • Not all information in the green tree is useful for macros.
  • A given head of node defines the semantics of the children (currently this includes the flags)
  • Having children strictly in source order is greatly clarifying in nearly all cases - there's is never a question about the possible order of children: it's just what you see every day reading the source.

What are some examples of differences and potential future differences between Julia AST and green tree?

  • We exclude all of what we currently tag with TRIVIA_FLAG. This is necessary to make the indices of children meaningful at all.
  • Some "syntactic container" nodes from the green tree make the AST harder to work with and aren't that useful. K"char" and K"cmdstring", K"parens" and K"var" at least. (K"string" is necessary because string interpolations are handled by the parser.)
  • K"macro_name" changes the semantics of one of the K"Identifiers" inside it and this nonlocality is awkward.
  • K"StrMacroName" and K"CmdMacroName" are oddities.
  • The ordering of the children of K"call" are troubling given infix vs prefix function calls. However, changing this would make it the one exception to the rule that children are source ordered and that seems like a huge invariant to give up. (Can pattern matching rescue us from this problem?)

For several of these I think it's important to keep some of the information that they are present without necessarily having them as nodes in the tree structure. Generally, I feel that macros should be easily able to see "most things which are important visual cues" about the source, as long as

  • That "visually important information" is consistent with the Julia parsing rules. (Counter example: indentation is visually important but never relevant to Julia parsing and macros shouldn't have easy access to that information.)
  • The information can be represented in a way that keeps macros composable with each other

Some examples of information which makes things visually distinct and I feel should be in the AST:

  • The presence of parentheses (in function calls, macro calls, tuples, etc). +(x,y) is very visually different from x + y and the most compelling solutions to RFC: curry underscore arguments to create anonymous functions JuliaLang/julia#24990 call for knowing about parentheses. We already provide for this with a syntax flag.
  • @mac_str "blah" is very visually distinct from str"blah" and it may be nice to have these distinguishable without it being intrusive on the tree structure. Likewise for K"var".
  • Some macro-implemented DSLs may want to know about the presence of K"parens" nodes. This could be provided with a nesting depth attribute (or flag if we considered only zero or one parens) instead of in the tree structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions