Skip to content

Conversation

@mlechu
Copy link
Collaborator

@mlechu mlechu commented Aug 25, 2025

Several small SyntaxGraph tweaks. Some of these have been split out from #35 since they are tiny improvements that don't need to wait for data size experimentation.

Changes worth discussing:

Do not coerce attributes to NamedTuple unnecessarily

It doesn't make sense to freeze Dict-attributes to NamedTuple after an ensure_attributes or delete_attributes call. I assume this freezing was unintentional and not detected because the attribute storage type abstraction is implemented quite well.

Note that this change increases precompile time and decreases test time. Below are outputs from @time using JuliaLowering and @time include("test/runtests.jl")

Before:

  • 45.395172 seconds (927.27 k allocations: 57.289 MiB, 0.08% gc time, 0.17% compilation time)
  • 24.653554 seconds (78.07 M allocations: 4.079 GiB, 4.25% gc time, 95.33% compilation time: <1% of which was recompilation)

After:

  • 59.053901 seconds (1.12 M allocations: 70.427 MiB, 0.06% gc time, 0.13% compilation time)
  • 11.100739 seconds (35.53 M allocations: 1.789 GiB, 4.09% gc time, 90.51% compilation time: 2% of which was recompilation)

(Our precompile statements could probably use some tweaking still)

Delete ineffective freeze_attrs call

This post-parsing freeze call wasn't doing anything. The freezing we got was actually from ensure_attributes. The output of lowering now has Dict-attributes, which I think are fine, but if we want to freeze post-lowering we can do that. Making the call effective has a bad effect on the test time above.

Print more information when node does not have attribute (and no default is provided)

This is an unrecoverable error anyway, so print a bit more information about the node we tried to access. Before:

julia> st = jlower("function foo end")
julia> st.value
ERROR: Property `value[33]` not found

After:

julia> st.value
ERROR: Property `value[33]` not found. Available attributes:
  kind = code_info,
  is_toplevel_thunk = true,
  source = 21,
  slots = JuliaLowering.Slot[]

mlechu added 8 commits August 25, 2025 10:01
(passing the NamedTuple variant is an error here, and can't be mutated anyway)
Missing utilities in line with existing ones (`freeze_attrs`, `attrnames`)
For `ensure_attributes` and `delete_attributes`, the output graph's
     `.attributes` now have the same type (`Dict` or `NamedTuple`) as the input.

Add `delete_attributes!` defined only on dict-attrs to be consistent with
     `ensure_attributes!`
Funny to realize it wasn't doing anything.  If we want freezing, it should go
    after lowering anyway.

Also clarify the `SyntaxTree(graph, syntaxnode)` signature.
@mlechu mlechu requested a review from c42f August 25, 2025 19:51
@aviatesk
Copy link
Collaborator

Note that this change increases precompile time and decreases test time.

It's good that we seemingly have better lowering runtime performance now.

I don't think this will resolve the compile-time bottleneck entirely, but I think forcing the type of unfrozen attributes to Dict{Symbol,Any} should improve type stability (and hopefully compile time too) to some extent.
The compiler doesn't have knowledge about the emptiness of dicts, so using the Dict(pairs(attributes)...) pattern like in the current unfreeze_attrs implementation may cause the need to infer the Dict{Any,Any} case as well, which could be causing longer compile times:

julia> Base.infer_return_type(x->Dict(pairs(x)...), (Dict{Symbol,Any},))
Union{Dict{Any, Any}, Dict{Symbol, Any}}

julia> Base.infer_return_type(x->Dict{Symbol,Any}(pairs(x)...), (Dict{Symbol,Any},))
Dict{Symbol, Any}

In practice, I think the actual type of attributes is Dict{Symbol,Any} always, so it might be better to enforce this at both the type definition level and call site level.

@mlechu
Copy link
Collaborator Author

mlechu commented Aug 27, 2025

Excellent suggestion. One day I hope to have an abstract interpreter running in my head as you do.

julia> @time using JuliaLowering
Precompiling JuliaLowering finished.
  1 dependency successfully precompiled in 27 seconds. 1 already precompiled.
 27.361220 seconds (722.49 k allocations: 45.832 MiB, 0.11% gc time, 0.25% compilation time)

julia> @time include("test/runtests.jl");
Test Summary:    | Pass  Broken  Total   Time
JuliaLowering.jl | 1165       1   1166  11.5s
 11.730933 seconds (47.43 M allocations: 2.423 GiB, 4.35% gc time, 91.44% compilation time: 4% of which was recompilation)

@mlechu mlechu changed the title SyntaxGraph: Several usability tweaks SyntaxGraph: Usability and performance tweaks Aug 27, 2025
@aviatesk
Copy link
Collaborator

Wow, I didn't expect it to be that effective. That's great.

In my opinion, I think we could be even more aggressive:

mutable struct SyntaxGraph{Attrs <: Union{Dict{Symbol,Any},NamedTuple}}
    edge_ranges::Vector{UnitRange{Int}}
    edges::Vector{NodeId}
    attributes::Attrs
end

But this is a somewhat restrictive change, so we might want to hear @c42f's opinion.
Probably the type instability of unfreeze_attrs is most important in the lowering pipeline, and I don't think this change will improve type instability that much. The Dict{Symbol,Any} information from unfreeze_attrs (call site) should theoretically propagate throughout the lowering pipeline. But if there is type instability in other places, having such restrictive type definitions can minimize the damage. Generics are sacrificed though.

Copy link
Owner

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we're now using Dict-based attributes throughout lowering?

If using Dict everywhere rather than NamedTuple is a runtime improvement, that's a huge performance debugging TODO, and honestly quite alarming. If the frozen attributes were working as expected, we should have things like ex.name_val type inferred as a String; overall the design goal was for SyntaxGraph to support type stable arbitrary attributes.

If you're sure Dict is an improvement we can merge this for now but it's clear that attribute storage performance needs revisiting.

Related - I'm currently working on some changes to make setattr! fully inferrable.

src/ast.jl Outdated
any other attribute.
"""
function copy_ast(ctx, ex)
function copy_ast(ctx, ex; copy_source=true)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we ever want copy_source=false?

Copy link
Collaborator Author

@mlechu mlechu Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed it for copying a tree into its own graph (I'll add a check_same_graph here). There should be a way of doing this without recursing on .source (which probably reaches everything in the graph), though I'm open to having it be a separate function if there's a reasonable name we can give it.

Tangential, but I'm now realizing we should fix the way we copy .source too. We're recursing on every child or source reference, where most nodes are referenced twice in this way, causing an explosion.

julia> st = jlower("function foo(;a=1); end")
SyntaxTree with attributes scope_type,lambda_bindings,name_val,syntax_flags,meta,scope_layer,mod,kind,value,var_id,id,is_toplevel_thunk,source,slots
<ast snip>

julia> length(st._graph.edge_ranges)
549

julia> @time JL.copy_ast(st._graph, st)
 11.903939 seconds (65.51 M allocations: 3.234 GiB, 27.79% gc time)
<ast snip>

julia> length(st._graph.edge_ranges)
4761563

I think we get lucky by never calling copy_ast this late in the lowering pipeline where nontrivial .source chains show up.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, copy_ast has only really been used for macro expansion.

I forgot the source handling didn't memoize anything (can't remember exactly but I think that code was part of some fairly early prototyping ... oops!)

@c42f
Copy link
Owner

c42f commented Aug 28, 2025

In my opinion, I think we could be even more aggressive:

mutable struct SyntaxGraph{Attrs <: Union{Dict{Symbol,Any},NamedTuple}}

It would be interesting if this helps, but surely it would be a sign that things aren't type inferred as expected and that should instead be fixed by a one or two well-placed type asserts?

mlechu added a commit that referenced this pull request Aug 28, 2025
@mlechu
Copy link
Collaborator Author

mlechu commented Aug 29, 2025

Does this mean we're now using Dict-based attributes throughout lowering?

Yes, although the type is no longer converted to anything after each pass, so either attribute type will come out of lowering unchanged, and it would be easy to switch the default back.

If using Dict everywhere rather than NamedTuple is a runtime improvement, that's a huge performance debugging TODO, and honestly quite alarming.

I believe the runtime of the compiled code is unchanged (~1.95 seconds to run all tests). The numbers I gave were for the first time running tests, which show a large percentage of compile time.

If you're sure Dict is an improvement we can merge this for now but it's clear that attribute storage performance needs revisiting.

Sounds good; I'll merge this given the speedup and the easiness of switching the default back. I agree we do need to go back and look into performance further.

@mlechu mlechu merged commit 2d2d677 into main Aug 29, 2025
2 checks passed
@mlechu mlechu deleted the ec/graph-tweaks branch August 29, 2025 00:26
@c42f
Copy link
Owner

c42f commented Aug 29, 2025

Sounds good; I'll merge this given the speedup and the easiness of switching the default back. I agree we do need to go back and look into performance further.

Sounds reasonable. Hopefully my attempts to make the NamedTuple version better will help ... we'll see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants