SyntaxGraph: Usability and performance tweaks #48

mlechu · 2025-08-25T19:42:15Z

Several small SyntaxGraph tweaks. Some of these have been split out from #35 since they are tiny improvements that don't need to wait for data size experimentation.

Changes worth discussing:

Do not coerce attributes to NamedTuple unnecessarily

It doesn't make sense to freeze Dict-attributes to NamedTuple after an ensure_attributes or delete_attributes call. I assume this freezing was unintentional and not detected because the attribute storage type abstraction is implemented quite well.

Note that this change increases precompile time and decreases test time. Below are outputs from @time using JuliaLowering and @time include("test/runtests.jl")

Before:

45.395172 seconds (927.27 k allocations: 57.289 MiB, 0.08% gc time, 0.17% compilation time)
24.653554 seconds (78.07 M allocations: 4.079 GiB, 4.25% gc time, 95.33% compilation time: <1% of which was recompilation)

After:

59.053901 seconds (1.12 M allocations: 70.427 MiB, 0.06% gc time, 0.13% compilation time)
11.100739 seconds (35.53 M allocations: 1.789 GiB, 4.09% gc time, 90.51% compilation time: 2% of which was recompilation)

(Our precompile statements could probably use some tweaking still)

Delete ineffective freeze_attrs call

This post-parsing freeze call wasn't doing anything. The freezing we got was actually from ensure_attributes. The output of lowering now has Dict-attributes, which I think are fine, but if we want to freeze post-lowering we can do that. Making the call effective has a bad effect on the test time above.

Print more information when node does not have attribute (and no default is provided)

This is an unrecoverable error anyway, so print a bit more information about the node we tried to access. Before:

julia> st = jlower("function foo end")
julia> st.value
ERROR: Property `value[33]` not found

After:

julia> st.value
ERROR: Property `value[33]` not found. Available attributes:
  kind = code_info,
  is_toplevel_thunk = true,
  source = 21,
  slots = JuliaLowering.Slot[]

(passing the NamedTuple variant is an error here, and can't be mutated anyway)

Missing utilities in line with existing ones (`freeze_attrs`, `attrnames`)

For `ensure_attributes` and `delete_attributes`, the output graph's `.attributes` now have the same type (`Dict` or `NamedTuple`) as the input. Add `delete_attributes!` defined only on dict-attrs to be consistent with `ensure_attributes!`

Funny to realize it wasn't doing anything. If we want freezing, it should go after lowering anyway. Also clarify the `SyntaxTree(graph, syntaxnode)` signature.

aviatesk · 2025-08-26T07:46:41Z

Note that this change increases precompile time and decreases test time.

It's good that we seemingly have better lowering runtime performance now.

I don't think this will resolve the compile-time bottleneck entirely, but I think forcing the type of unfrozen attributes to Dict{Symbol,Any} should improve type stability (and hopefully compile time too) to some extent.
The compiler doesn't have knowledge about the emptiness of dicts, so using the Dict(pairs(attributes)...) pattern like in the current unfreeze_attrs implementation may cause the need to infer the Dict{Any,Any} case as well, which could be causing longer compile times:

julia> Base.infer_return_type(x->Dict(pairs(x)...), (Dict{Symbol,Any},))
Union{Dict{Any, Any}, Dict{Symbol, Any}}

julia> Base.infer_return_type(x->Dict{Symbol,Any}(pairs(x)...), (Dict{Symbol,Any},))
Dict{Symbol, Any}

In practice, I think the actual type of attributes is Dict{Symbol,Any} always, so it might be better to enforce this at both the type definition level and call site level.

Co-authored-by: Shuhei Kadowaki <[email protected]>

mlechu · 2025-08-27T16:17:25Z

Excellent suggestion. One day I hope to have an abstract interpreter running in my head as you do.

julia> @time using JuliaLowering
Precompiling JuliaLowering finished.
  1 dependency successfully precompiled in 27 seconds. 1 already precompiled.
 27.361220 seconds (722.49 k allocations: 45.832 MiB, 0.11% gc time, 0.25% compilation time)

julia> @time include("test/runtests.jl");
Test Summary:    | Pass  Broken  Total   Time
JuliaLowering.jl | 1165       1   1166  11.5s
 11.730933 seconds (47.43 M allocations: 2.423 GiB, 4.35% gc time, 91.44% compilation time: 4% of which was recompilation)

aviatesk · 2025-08-27T18:36:11Z

Wow, I didn't expect it to be that effective. That's great.

In my opinion, I think we could be even more aggressive:

mutable struct SyntaxGraph{Attrs <: Union{Dict{Symbol,Any},NamedTuple}}
    edge_ranges::Vector{UnitRange{Int}}
    edges::Vector{NodeId}
    attributes::Attrs
end

But this is a somewhat restrictive change, so we might want to hear @c42f's opinion.
Probably the type instability of unfreeze_attrs is most important in the lowering pipeline, and I don't think this change will improve type instability that much. The Dict{Symbol,Any} information from unfreeze_attrs (call site) should theoretically propagate throughout the lowering pipeline. But if there is type instability in other places, having such restrictive type definitions can minimize the damage. Generics are sacrificed though.

c42f

Does this mean we're now using Dict-based attributes throughout lowering?

If using Dict everywhere rather than NamedTuple is a runtime improvement, that's a huge performance debugging TODO, and honestly quite alarming. If the frozen attributes were working as expected, we should have things like ex.name_val type inferred as a String; overall the design goal was for SyntaxGraph to support type stable arbitrary attributes.

If you're sure Dict is an improvement we can merge this for now but it's clear that attribute storage performance needs revisiting.

Related - I'm currently working on some changes to make setattr! fully inferrable.

c42f · 2025-08-28T13:02:27Z

src/ast.jl

+any other attribute.
 """
-function copy_ast(ctx, ex)
+function copy_ast(ctx, ex; copy_source=true)


When would we ever want copy_source=false?

I needed it for copying a tree into its own graph (I'll add a check_same_graph here). There should be a way of doing this without recursing on .source (which probably reaches everything in the graph), though I'm open to having it be a separate function if there's a reasonable name we can give it.

Tangential, but I'm now realizing we should fix the way we copy .source too. We're recursing on every child or source reference, where most nodes are referenced twice in this way, causing an explosion.

julia> st = jlower("function foo(;a=1); end") SyntaxTree with attributes scope_type,lambda_bindings,name_val,syntax_flags,meta,scope_layer,mod,kind,value,var_id,id,is_toplevel_thunk,source,slots <ast snip> julia> length(st._graph.edge_ranges) 549 julia> @time JL.copy_ast(st._graph, st) 11.903939 seconds (65.51 M allocations: 3.234 GiB, 27.79% gc time) <ast snip> julia> length(st._graph.edge_ranges) 4761563

I think we get lucky by never calling copy_ast this late in the lowering pipeline where nontrivial .source chains show up.

Right, copy_ast has only really been used for macro expansion.

I forgot the source handling didn't memoize anything (can't remember exactly but I think that code was part of some fairly early prototyping ... oops!)

src/syntax_graph.jl

c42f · 2025-08-28T14:15:43Z

In my opinion, I think we could be even more aggressive:
mutable struct SyntaxGraph{Attrs <: Union{Dict{Symbol,Any},NamedTuple}}

It would be interesting if this helps, but surely it would be a sign that things aren't type inferred as expected and that should instead be fixed by a one or two well-placed type asserts?

#48

#48 (comment)

Co-authored-by: Claire Foster <[email protected]>

mlechu · 2025-08-29T00:18:51Z

Does this mean we're now using Dict-based attributes throughout lowering?

Yes, although the type is no longer converted to anything after each pass, so either attribute type will come out of lowering unchanged, and it would be easy to switch the default back.

If using Dict everywhere rather than NamedTuple is a runtime improvement, that's a huge performance debugging TODO, and honestly quite alarming.

I believe the runtime of the compiled code is unchanged (~1.95 seconds to run all tests). The numbers I gave were for the first time running tests, which show a large percentage of compile time.

If you're sure Dict is an improvement we can merge this for now but it's clear that attribute storage performance needs revisiting.

Sounds good; I'll merge this given the speedup and the easiness of switching the default back. I agree we do need to go back and look into performance further.

c42f · 2025-08-29T03:21:06Z

Sounds good; I'll merge this given the speedup and the easiness of switching the default back. I agree we do need to go back and look into performance further.

Sounds reasonable. Hopefully my attempts to make the NamedTuple version better will help ... we'll see.

mlechu added 8 commits August 25, 2025 10:01

copy_ast: Add option to not recurse on .source, clarify docs

6e07e7d

ensure_attributes!: Make signature reflect function body

253fa1d

(passing the NamedTuple variant is an error here, and can't be mutated anyway)

Add graph utils: unfreeze_attrs, attrtypes

59fff26

Missing utilities in line with existing ones (`freeze_attrs`, `attrnames`)

Print more information when node does not have attribute

a75065e

Fix printing for identifier-like kinds String/Cmd MacroName

26b81e2

Do not coerce attrs to NamedTuple unnecessarily

be71982

For `ensure_attributes` and `delete_attributes`, the output graph's `.attributes` now have the same type (`Dict` or `NamedTuple`) as the input. Add `delete_attributes!` defined only on dict-attrs to be consistent with `ensure_attributes!`

Remove ineffective call to freeze_attrs converting from SyntaxNode

0520c94

Funny to realize it wasn't doing anything. If we want freezing, it should go after lowering anyway. Also clarify the `SyntaxTree(graph, syntaxnode)` signature.

Test ensure, delete attrs, attrtypes

356d61a

mlechu requested a review from c42f August 25, 2025 19:51

unfreeze_attrs: produce Dict{Symbol, Any} instead of Dict

06ba024

Co-authored-by: Shuhei Kadowaki <[email protected]>

mlechu changed the title ~~SyntaxGraph: Several usability tweaks~~ SyntaxGraph: Usability and performance tweaks Aug 27, 2025

c42f approved these changes Aug 28, 2025

View reviewed changes

mlechu added a commit that referenced this pull request Aug 28, 2025

Fix copy_ast copying too much

8906957

#48

Fix copy_ast copying too much

668021c

#48 (comment)

mlechu force-pushed the ec/graph-tweaks branch from 5312d93 to cf0cd4f Compare August 28, 2025 20:11

Apply suggestions from code review

7e28e09

Co-authored-by: Claire Foster <[email protected]>

mlechu force-pushed the ec/graph-tweaks branch from cf0cd4f to 7e28e09 Compare August 28, 2025 20:16

Add tests for copy_ast

18d8d1e

mlechu merged commit 2d2d677 into main Aug 29, 2025
2 checks passed

mlechu deleted the ec/graph-tweaks branch August 29, 2025 00:26

SyntaxGraph: Usability and performance tweaks #48

SyntaxGraph: Usability and performance tweaks #48

Uh oh!

Conversation

mlechu commented Aug 25, 2025

Do not coerce attributes to NamedTuple unnecessarily

Delete ineffective freeze_attrs call

Print more information when node does not have attribute (and no default is provided)

Uh oh!

aviatesk commented Aug 26, 2025

Uh oh!

mlechu commented Aug 27, 2025

Uh oh!

aviatesk commented Aug 27, 2025

Uh oh!

c42f left a comment

Choose a reason for hiding this comment

Uh oh!

c42f Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

mlechu Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c42f Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

c42f commented Aug 28, 2025

Uh oh!

mlechu commented Aug 29, 2025

Uh oh!

Uh oh!

c42f commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mlechu Aug 28, 2025 •

edited

Loading