Skip to content

Conversation

@mlechu
Copy link
Collaborator

@mlechu mlechu commented Sep 3, 2025

Update to the latest dev version of JuliaSyntax, where the only notable change was JuliaLang/JuliaSyntax.jl#572 minus changes from JuliaLang/JuliaSyntax.jl#583. Not sure how I feel about the change in general (in favour of making identifiers K"Identifiers, against nodes that don't change semantics, in favour of cleaning up the parser), but I certainly want the latest JuliaSyntax and JuliaLowering to work together. This change continues to put identifier normalization in lowering.

Some fixes were needed on on the JuliaSyntax end, see JuliaLang/JuliaSyntax.jl#595.

@mlechu mlechu requested a review from c42f September 3, 2025 22:14
@c42f
Copy link
Owner

c42f commented Sep 4, 2025

Not sure how I feel about the change in general (in favour of making identifiers K"Identifiers, against nodes that don't change semantics, in favour of cleaning up the parser

Ah yeah some of the changes you're forced to make here make me wonder whether the JuliaSyntax changes were good tradeoffs. But I'm not sure I know exactly what you meant with this comment - could you expand?

@mlechu
Copy link
Collaborator Author

mlechu commented Sep 4, 2025

I think that some piece of code should be responsible for producing the most useful possible AST for macro writers. (This also probably means a good AST for lowering, JETLS, and other consumers that don't use the green tree much.) I also think we're at risk of failing to assign this responsibility to anything.

What we currently do is pretty close to ideal! To me, "useful" should also include:

  • Not producing container nodes that don't change semantics and hinder pattern-matching (parens, var)
    • macro_name too, though right now it does change semantics of some contained identifier
  • Normalizing identifiers early—e.g. a K"Identifier" with name @mac makes sense in any context: as the first child of a K"macrocall", as an argument to an import, or as a leaf in isolation when scanning the tree for identifiers.
  • Keeping semantics out of the syntax flags, OR putting much more in the syntax flags and defining a good API that macro authors can use.

I agree that lowering-like transforms shouldn't be the job of a parser, but I can't think of any better place to produce the most-useful-AST than JuliaSyntax. We could assign that responsibility to a new pre-macro-expansion pass in lowering, but then packages wouldn't benefit, and tooling would see a different AST than macros. The best solution I have is to consider the green tree -> SyntaxNode (SyntaxTree soon 🙂) transformation as something external to "parsing" and assign it with producing the most useful AST.

Side question: What was the original reason for having the K"MacroName" terminal? I could be missing something. Also cc @Keno in case he has thoughts once he's back; I assumed the macro_name change was largely for parser improvement reasons, but I could be missing your use case.

@mlechu
Copy link
Collaborator Author

mlechu commented Sep 4, 2025

As for this PR, it does make our implementation more complex, and I'm OK with closing it and using the diff as reason to change JuliaSyntax instead. I'm also OK with merging in the meantime and undoing it later, since we do need to bump our JuliaSyntax version at some point.

@c42f
Copy link
Owner

c42f commented Sep 6, 2025

the most useful possible AST for macro writers
I also think we're at risk of failing to assign this responsibility to anything

Excellent points! I think there's two possible answers:

  1. Like you said, transform the data structure itself in a simplification pass prior to macro expansion. We already do this by removing parentheses.
  2. A pattern matching system (or other AST access API) which can do that simplification on the fly

In either case, I strongly believe the underlying green tree or equivalent information should still be accessible to macros and lowering. For example, the difference between +(a,b) and a + b is quite important in the most compelling proposals in the infamous "what to do about underscores" debate JuliaLang/julia#24990.

What was the original reason for having the K"MacroName" terminal

The reason is that @ can be separated from the macro name - we unfortunately have A.@asdf and @A.asdf both meaning (. A @asdf). If Julia only allowed the more sane syntax A.@asdf I would have parsed @asdf as K"Identifier", including the @ symbol.

The old parser did this by rewriting the asdf symbol to @asdf immediately as a normal identifier but we don't have this luxury because we're trying to produce a green tree. So in the green tree I chose to have a special nonterminal without the @. Conversely, K"macro_name" goes in the other direction and represents the @ where it actually occurs in the user's source.

@mlechu
Copy link
Collaborator Author

mlechu commented Sep 6, 2025

I think there's two possible answers:

Can we have both? :) I think pattern matching is good regardless, and careful AST design makes it even better.

The old parser did this by rewriting the asdf symbol to @asdf immediately as a normal identifier but we don't have this luxury because we're trying to produce a green tree.

Oh, that makes sense. I realize our GreenNode->SyntaxNode transformation uses a different system from our lowering passes.

I strongly believe the underlying green tree or equivalent information should still be accessible to macros and lowering

That's a good example and I think I agree. Is your plan to have green tree access go through the SyntaxTree provenance system? That might solve the need for macro_name/MacroName.

An alternative (though opt-in) solution to macro_name would be waiting for a syntax evolution system and deprecating the @A.asdf form. Making the AST easier to define does bring us closer to having a system for syntax evolution (not that I would be working on anything of the sort, and hence biased...)

@c42f
Copy link
Owner

c42f commented Sep 7, 2025

Can we have both? :)

Yes of course :-D A all the fiddly differences in conversion to Expr are attempts to improve the AST.

Is your plan to have green tree access go through the SyntaxTree provenance system?

After thinking a little more about it, no. We can't rely on GreenNode (or anything which overlays SourceFile) because we do have a lot of ASTs which are programmatically generated. We'll need any "macro relevant" information to be present in the tree or its attributes with the provenance strictly reserved for solving "where did this originate?"

I've got a lot to write about this, I'll try to capture it into an issue.

deprecating the @A.asdf form

If we magically had syntax evolution this is one of the first things I'd deprecate! It's awkward to deal with and adds nothing of value over treating the @ as part of the identifier. With that in mind, I'd argue this should be seen as syntactic trivia and it shouldn't be easy for new-style macros to observe. (Yes, perhaps it should be invisible, except in the green tree.)

@mlechu
Copy link
Collaborator Author

mlechu commented Sep 8, 2025

I've got a lot to write about this, I'll try to capture it into an issue.

Thanks for writing things out!

After thinking a little more about it, no. We can't rely on GreenNode (or anything which overlays SourceFile) because we do have a lot of ASTs which are programmatically generated. We'll need any "macro relevant" information to be present in the tree or its attributes with the provenance strictly reserved for solving "where did this originate?"

I don't quite follow. If a macro needs to tell whether its input was +(a,b), a + b, or "generated AST with no backing green tree," wouldn't that information be available whether we store it in the provenance system or put it in some new attribute?

Putting this information in the tree or syntax flags seems it would erase the third option and make it equivalent to one of the others. Maybe that's desirable (pre-handles an edge case)? I would be 100% OK with telling authors of macros that peek at the green tree to handle the case where there is no green tree, since this way they can choose the behaviour they want.

@c42f
Copy link
Owner

c42f commented Sep 12, 2025

Is your plan to have green tree access go through the SyntaxTree provenance system?

After thinking a little more about it, no. We can't rely on GreenNode (or anything which overlays SourceFile) because we do have a lot of ASTs which are programmatically generated. We'll need any "macro relevant" information to be present in the tree or its attributes with the provenance strictly reserved for solving "where did this originate?"

I don't quite follow. If a macro needs to tell whether its input was +(a,b), a + b, or "generated AST with no backing green tree," wouldn't that information be available whether we store it in the provenance system or put it in some new attribute?

Ah, I think I misunderstood your question.

  • If we did give access to the green tree it can go through the provenance system, absolutely. This is the natural place for it - it's there in SourceRef now.
  • However, I'm fairly sure the green tree shouldn't be a standard part of the API macros are encouraged to use directly:
    • It should be possible to programmatically generate AST which macros see as equivalent to any parsed AST. I feel this is necessary if we want macro expansion to be composable.
    • Some people may want to strip the provenance information after compilation, so it may not even be present. But we should be able to retain the difference between +(a,b) and a + b regardless of the SourceRef being replaced with nothing

Basically, I see provenance as a separate concern from the data which macros are "allowed" to inspect when generating output AST.

(Put another way, if a rogue macro wants to dig into the SourceRef implementation and peek at the green tree and use the amount of whitespace semantically ... that would be a pretty funny hack. But that macro will be brittle and non-composable with other macro expansion.)

@mlechu
Copy link
Collaborator Author

mlechu commented Sep 12, 2025

Ah, I think I misunderstood your question.

No worries, I should have been more clear with "green tree access through the provenance system." What I meant is representing the green tree as a SyntaxTree and adding it to our existing chain of these things, so we have green_st -> parsed_st0 -> desugared_st1 -> ... lowered_st5. This would mean removing the green tree from SourceRef and no longer using GreenNode. I think it would be a clean representation, and could solve the need for MacroName in our "useful AST" since green_st -> parsed_st0 could perform arbitrary transformations like our lowering passes (of course the transformations wouldn't be as complex).

Basically, I see provenance as a separate concern from the data which macros are "allowed" to inspect when generating output AST.

That's fair. My motivation in suggesting that macros be allowed to use the green tree is so we don't have to hand-select the set of "not semantically important but maybe important to a few macros" syntax. If we manage to standardize the green tree, it would be worth looking into making a good API, but for now I agree it would be brittle. I'm OK with hand-selecting these things and putting them in the syntax flags.

Copy link
Owner

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it

test/macros.jl Outdated
sprint(showerror, exc)
end == """
MacroExpansionError while expanding @oldstyle_error in module Main.macro_test:
MacroExpansionError while expanding (macro_name oldstyle_error) in module Main.macro_test:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert to Expr and use the pretty printing from there rather than showing the sexpr form to users? (to be fair this was always a problem, it's just more obvious with the latest changes)

mac_name = string(e.args[1])
mac_name = mac_name == "@__dot__" ? "." : mac_name[2:end]
child_exprs[1] = Symbol(mac_name)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, this back and forth conversion is ugly ... at face value it makes me worry the upstream changes weren't a good tradeoff.

I guess we could delete all this if we gave up on trying to reproduce all the macro name kinds. They are mainly present to represent the source text as it is parsed ... it might be fine if they're normalized to K"Identifier" when converting Expr back to SyntaxTree?

On the other hand, it seems bad to have more cases when SyntaxTree->Expr->SyntaxTree gives a different expression given that macros can observe the difference. Hmm.

@c42f
Copy link
Owner

c42f commented Sep 16, 2025

Ok, I've cleaned up the macro name thing. It turns out that pretty printing macro names is surprisingly messy but it's possible to co-opt the Expr pretty printing machinery to do a decent enough job. (Or we could call sourcetext()? Maybe that's ok but I realized it can misrepresent synthetically constructed macro names. In fact we may have a general problem with showing compound mixed-provenance expressions. So I left that as a comment in the source for now.)

@c42f c42f merged commit 134d4ad into main Sep 16, 2025
2 checks passed
@c42f c42f deleted the ec/js-bump branch September 16, 2025 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants