Skip to content

Conversation

@Keno
Copy link
Member

@Keno Keno commented Jul 9, 2025

The current representation for macro names is a bit peculiar. When the parser encounters @a, it treats @ as notation for the macrocall and then reset_node!'s (which itself may be considerd a bit of a code smell) the a to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the @. Things get even more preculiar with @var"a" where only the token inside the string macro gets reset.

One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does).

A second issue is that we currently parse @A.b.c and A.b.@c to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like @) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. @A.b.$ is an error, but A.B.@$ is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face.

So this PR tries to clean that all up a bit by:

  • Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, @A.c parses as (macro_name (. A c)) while A.@c parses as (. A (macro_name c)).
  • (In particular the @ notation is now always associated with the macro_name).
  • Emitting the dots in @.. and @... as direct identifier tokens rather than having to reset them back.
  • Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with the actual code changes.

The current representation for macro names is a bit peculiar. When
the parser encounters `@a`, it treats `@` as notation for the
macrocall and then `reset_node!`'s (which itself may be considerd
a bit of a code smell) the `a` to a special MacroName
token kind that (when translated back to julia Expr) implicitly
adds back the `@`. Things get even more preculiar with `@var"a"`
where only the token inside the string macro gets reset.

One particular consequence of this is JuliaLang/julia#58885,
because our translation back to Expr does not check the RAW_STRING_FLAG
(whereas the translation for K"Identifier" does).

A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the
same SyntaxTree (of course the green tree is different). We aren't
currently being super precise about the required invariants for syntax
trees, but in general it would be desirable for non-trivia notation
(like `@`) to be precisely recoverable from the tree, which is not
the case here. This is especially annoying because there are syntax
cases that are errors for one of these, but not the other (e.g.
`@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom
of some of those syntax choices can be debated, but that is the situation
we face.

So this PR tries to clean that all up a bit by:
- Replacing the terminal K"MacroName" by a non-terminal K"macro_name".
  With this form, `@A.c` parses as `(macro_name (. A c))` while
  `A.@c` parses as `(. A (macro_name c))`.
- (In particular the `@` notation is now always associated with the
   macro_name).
- Emitting the dots in `@..` and `@...` as direct identifier tokens
  rather than having to reset them back.
- Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with
the actual code changes.
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the right approach.

There's some broken cases still in the logic around is_macrocall_on_entry, for example, there's two macro_name nodes in the following:

julia> parsestmt(SyntaxNode, "@a[b][c]", ignore_warnings=true)
SyntaxNode:
[ref]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [vect]
        b                                :: Identifier
  c                                      :: Identifier

and similarly for curlies:

julia> parsestmt(SyntaxNode, "@a{b}{c}", ignore_warnings=true)
SyntaxNode:
[curly]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [braces]
        b                                :: Identifier
  c                                      :: Identifier

Very optionally, there's also the opportunity to simplify macro_name_position down to just tracking the orig_kind of the last identifier, now that fix_macro_name_kind!() is gone.

Copy link
Member

@mlechu mlechu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an upgrade to me

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Keno Keno merged commit 63bee39 into main Jul 10, 2025
36 checks passed
@Keno Keno deleted the kf/macroname branch July 10, 2025 18:08
c42f pushed a commit to JuliaLang/julia that referenced this pull request Oct 17, 2025
The current representation for macro names is a bit peculiar. When
the parser encounters `@a`, it treats `@` as notation for the
macrocall and then `reset_node!`'s (which itself may be considered
a bit of a code smell) the `a` to a special MacroName
token kind that (when translated back to julia Expr) implicitly
adds back the `@`. Things get even more peculiar with `@var"a"`
where only the token inside the string macro gets reset.

One particular consequence of this is #58885,
because our translation back to Expr does not check the RAW_STRING_FLAG
(whereas the translation for K"Identifier" does).

A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the
same SyntaxTree (of course the green tree is different). We aren't
currently being super precise about the required invariants for syntax
trees, but in general it would be desirable for non-trivia notation
(like `@`) to be precisely recoverable from the tree, which is not
the case here. This is especially annoying because there are syntax
cases that are errors for one of these, but not the other (e.g.
`@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom
of some of those syntax choices can be debated, but that is the situation
we face.

So this PR tries to clean that all up a bit by:
- Replacing the terminal K"MacroName" by a non-terminal K"macro_name".
  With this form, `@A.c` parses as `(macro_name (. A c))` while
  `A.@c` parses as `(. A (macro_name c))`.
- (In particular the `@` notation is now always associated with the
   macro_name).
- Emitting the dots in `@..` and `@...` as direct identifier tokens
  rather than having to reset them back.
- Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with
the actual code changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants