Tweak macro name representation #572

Keno · 2025-07-09T05:23:31Z

The current representation for macro names is a bit peculiar. When the parser encounters @a, it treats @ as notation for the macrocall and then reset_node!'s (which itself may be considerd a bit of a code smell) the a to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the @. Things get even more preculiar with @var"a" where only the token inside the string macro gets reset.

One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does).

A second issue is that we currently parse @A.b.c and A.b.@c to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like @) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. @A.b.$ is an error, but A.B.@$ is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face.

So this PR tries to clean that all up a bit by:

Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, @A.c parses as (macro_name (. A c)) while A.@c parses as (. A (macro_name c)).
(In particular the @ notation is now always associated with the macro_name).
Emitting the dots in @.. and @... as direct identifier tokens rather than having to reset them back.
Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with the actual code changes.

The current representation for macro names is a bit peculiar. When the parser encounters `@a`, it treats `@` as notation for the macrocall and then `reset_node!`'s (which itself may be considerd a bit of a code smell) the `a` to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the `@`. Things get even more preculiar with `@var"a"` where only the token inside the string macro gets reset. One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does). A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like `@`) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. `@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face. So this PR tries to clean that all up a bit by: - Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, `@A.c` parses as `(macro_name (. A c))` while `A.@c` parses as `(. A (macro_name c))`. - (In particular the `@` notation is now always associated with the macro_name). - Emitting the dots in `@..` and `@...` as direct identifier tokens rather than having to reset them back. - Adjusting everything else accordingly. Partially written by Claude Code, though it had some trouble with the actual code changes.

c42f

Looks like the right approach.

There's some broken cases still in the logic around is_macrocall_on_entry, for example, there's two macro_name nodes in the following:

julia> parsestmt(SyntaxNode, "@a[b][c]", ignore_warnings=true)
SyntaxNode:
[ref]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [vect]
        b                                :: Identifier
  c                                      :: Identifier

and similarly for curlies:

julia> parsestmt(SyntaxNode, "@a{b}{c}", ignore_warnings=true)
SyntaxNode:
[curly]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [braces]
        b                                :: Identifier
  c                                      :: Identifier

Very optionally, there's also the opportunity to simplify macro_name_position down to just tracking the orig_kind of the last identifier, now that fix_macro_name_kind!() is gone.

src/julia/parser.jl

test/parser.jl

src/integration/expr.jl

test/expr.jl

src/julia/parser.jl

mlechu

Seems like an upgrade to me

c42f

LGTM

The current representation for macro names is a bit peculiar. When the parser encounters `@a`, it treats `@` as notation for the macrocall and then `reset_node!`'s (which itself may be considered a bit of a code smell) the `a` to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the `@`. Things get even more peculiar with `@var"a"` where only the token inside the string macro gets reset. One particular consequence of this is #58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does). A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like `@`) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. `@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face. So this PR tries to clean that all up a bit by: - Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, `@A.c` parses as `(macro_name (. A c))` while `A.@c` parses as `(. A (macro_name c))`. - (In particular the `@` notation is now always associated with the macro_name). - Emitting the dots in `@..` and `@...` as direct identifier tokens rather than having to reset them back. - Adjusting everything else accordingly. Partially written by Claude Code, though it had some trouble with the actual code changes.

Keno requested review from c42f and mlechu July 9, 2025 05:23

Keno mentioned this pull request Jul 9, 2025

JuliaSyntax misparses @var"\"" JuliaLang/julia#58885

Open

c42f reviewed Jul 9, 2025

View reviewed changes

mlechu reviewed Jul 9, 2025

View reviewed changes

Keno added 4 commits July 10, 2025 02:25

Adjust comments

56448f4

Fix initial macrocall handling

02c709b

Tweak the error recovery for [email protected]

498fd6d

Update comment

6b3b85a

c42f approved these changes Jul 10, 2025

View reviewed changes

Keno merged commit 63bee39 into main Jul 10, 2025
36 checks passed

Keno deleted the kf/macroname branch July 10, 2025 18:08

Keno mentioned this pull request Jul 11, 2025

Parser Failed with @(var"which") #441

Closed

c42f mentioned this pull request Aug 6, 2025

macro_name_str should not wrap around full macro path #582

Closed

mlechu mentioned this pull request Sep 3, 2025

Adapt to JuliaSyntax MacroName change c42f/JuliaLowering.jl#71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tweak macro name representation #572

Tweak macro name representation #572

Uh oh!

Keno commented Jul 9, 2025

Uh oh!

c42f left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mlechu left a comment

Uh oh!

c42f left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Tweak macro name representation #572

Tweak macro name representation #572

Uh oh!

Conversation

Keno commented Jul 9, 2025

Uh oh!

c42f left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mlechu left a comment

Choose a reason for hiding this comment

Uh oh!

c42f left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants