Skip to content

Conversation

lkdvos
Copy link
Member

@lkdvos lkdvos commented Jul 1, 2025

This is a PR that changes multiple things, after a bunch of discussions and brainstorming with @ogauthe over the past weeks.

Firstly, I simplified the AbelianTreeTransformer by only storing the relevant information, ie this object no longer needs to hold on to the entire tensor fusiontree structure, and simply has the required data in a Vector. This also changes the implementation to "array-of-struct" instead of "struct-of-arrays", which should slightly improve data locality (although this is probably negligible).

Secondly, motivated by @ogauthe, I experimented with grouping the GenericTreeTransformer over the different subblocks with the same uncoupled charges. These subblocks have the same sizes, and as a result it is possible to concatenate them into a contiguous buffer to perform the recoupling through a BLAS mul! call. If this transformation is indeed given by a dense matrix of N x N coefficients with subblocks of length L, this changes the implementation from:

  • previously: N^2 calls to tensoradd!, ie N^2 scaled permutations of L elements
  • now: N almost contiguous copies of L elements, a BLAS call of N^2 L, and N permutations of L elements.

The main idea is that at least for the dominant part of this algorithm, the data is kept contiguous and we can make use of efficient BLAS routines, effectively changing the number of non-optimal access operations from ~ N^2 to ~ N.

Finally, I also went ahead and actually implemented the multi-threading, motivated by the implementation with an atomic Int in #100. This seems like a very lightweight implementation that balances loads quite well, and has no external dependencies.

I think the major remaining components are to benchmark these implementations, presumably in real-world scenarios such as MPSKit and PEPSKit, and possibly find some heuristic to disable the multi-threading for tensors that are too small for the overhead to outweigh the benefits.

!!! note
Before merging I'd like to add @ogauthe to the co-author list of this PR, given his involvement with figuring this out.

Copy link

codecov bot commented Jul 1, 2025

Codecov Report

Attention: Patch coverage is 84.65347% with 31 lines in your changes missing coverage. Please review.

Project coverage is 82.90%. Comparing base (ffa6cf9) to head (1016b5a).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/tensors/indexmanipulations.jl 75.26% 23 Missing ⚠️
src/TensorKit.jl 14.28% 6 Missing ⚠️
src/tensors/treetransformers.jl 97.67% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #251      +/-   ##
==========================================
- Coverage   83.08%   82.90%   -0.19%     
==========================================
  Files          44       44              
  Lines        5647     5749     +102     
==========================================
+ Hits         4692     4766      +74     
- Misses        955      983      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lkdvos lkdvos requested a review from Jutho July 3, 2025 15:12
Copy link
Member

@Jutho Jutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great PR, and I am fully supportive of the new strategy, which is clearly (in hindsight) much more sensible. I left a few comments, but I don't think any of them is really essential.

@lkdvos
Copy link
Member Author

lkdvos commented Jul 7, 2025

For the sorting, as well as the cost model of the steps, given the current multi-threading implementation, and the fact that the buffer sizes are determined independently, this is actually no longer really relevant. I'm happy to leave them in, in case anyone wants to experiment with different threading strategies, but otherwise I don't think this has any meaningful impact.

@lkdvos
Copy link
Member Author

lkdvos commented Jul 8, 2025

So, after running some benchmarks, the results are in. This is clearly an improvement, but it also is important to notice that this only starts making a difference once the number of legs becomes large, as for the MPO contractions it doesn't seem to be as relevant.

These are the results for running everything single threaded.

MPO contraction
julia> group = "mpo";

julia> comparison = collect(judge(result_est_new[group], result_est_main[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[160, 5, 3]", 2), TrialJudgement(+6.98% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[200, 20, 20]", 2), TrialJudgement(-8.49% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[2560, 5, 3]", 2), TrialJudgement(-1.18% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[40, 5, 3]", 2), TrialJudgement(+0.16% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 20, 20]", 2), TrialJudgement(+3.42% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 40, 40]", 2), TrialJudgement(+2.80% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6120, 5, 3]", 2), TrialJudgement(+0.35% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[640, 5, 3]", 2), TrialJudgement(-10.36% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[10, 4, 3]", "nothing"), TrialJudgement(-1.89% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[100, 10, 10]", "nothing"), TrialJudgement(-4.63% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[160, 4, 3]", "nothing"), TrialJudgement(-6.55% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[200, 10, 10]", "nothing"), TrialJudgement(-2.63% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[2560, 4, 3]", "nothing"), TrialJudgement(-0.02% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[300, 20, 20]", "nothing"), TrialJudgement(-1.21% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[40, 4, 3]", "nothing"), TrialJudgement(-5.92% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[640, 4, 3]", "nothing"), TrialJudgement(-0.38% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[160, 5, 3]", 0.5), TrialJudgement(-13.16% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[200, 20, 20]", 0.5), TrialJudgement(-8.85% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[2560, 5, 3]", 0.5), TrialJudgement(+0.55% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[40, 5, 3]", 0.5), TrialJudgement(-5.27% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 20, 20]", 0.5), TrialJudgement(-4.36% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 40, 40]", 0.5), TrialJudgement(-0.82% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6120, 5, 3]", 0.5), TrialJudgement(+0.78% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[640, 5, 3]", 0.5), TrialJudgement(-3.15% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[10, 4, 4]", 0.5), TrialJudgement(-4.24% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[100, 10, 10]", 0.5), TrialJudgement(-11.83% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[160, 4, 4]", 0.5), TrialJudgement(-18.93% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[200, 10, 10]", 0.5), TrialJudgement(-6.15% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[2560, 4, 4]", 0.5), TrialJudgement(+0.11% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[300, 20, 20]", 0.5), TrialJudgement(-6.32% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[40, 4, 4]", 0.5), TrialJudgement(-4.59% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[640, 4, 4]", 0.5), TrialJudgement(-3.05% => invariant))
MERA contraction
julia> group = "mera";

julia> comparison = collect(judge(result_est_new[group], result_est_main[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
24-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", 4, 2.0), TrialJudgement(-70.38% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 8, 2.0), TrialJudgement(-86.74% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 12, 2.0), TrialJudgement(-83.67% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 16, 2.0), TrialJudgement(-92.40% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 22, 2.0), TrialJudgement(-87.01% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 28, 2.0), TrialJudgement(-81.28% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", 2, "nothing"), TrialJudgement(-1.19% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 3, "nothing"), TrialJudgement(-3.45% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 4, "nothing"), TrialJudgement(-11.41% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 8, "nothing"), TrialJudgement(-1.69% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 12, "nothing"), TrialJudgement(+1.75% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 16, "nothing"), TrialJudgement(+1.55% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 4, 0.5), TrialJudgement(-1.29% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 8, 0.5), TrialJudgement(-4.32% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 12, 0.5), TrialJudgement(+4.31% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 16, 0.5), TrialJudgement(+1.63% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 22, 0.5), TrialJudgement(+2.38% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 28, 0.5), TrialJudgement(-0.75% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 2, 0.5), TrialJudgement(-2.61% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 4, 0.5), TrialJudgement(-2.52% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 8, 0.5), TrialJudgement(-1.75% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 12, 0.5), TrialJudgement(-2.78% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 16, 0.5), TrialJudgement(+1.90% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 20, 0.5), TrialJudgement(+1.64% => invariant))
PEPO contraction
julia> group = "pepo";

julia> comparison = collect(judge(result_est_new[group], result_est_main[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 2, 2, 50]", 2.0), TrialJudgement(-90.86% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 3, 2, 100]", 2.0), TrialJudgement(-87.67% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 2, 2, 100]", 2.0), TrialJudgement(-81.93% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 4, 4, 200]", 2.0), TrialJudgement(-88.23% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 2, 2, 100]", 2.0), TrialJudgement(-78.96% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 3, 4, 200]", 2.0), TrialJudgement(-73.29% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 2, 100]", 2.0), TrialJudgement(-87.60% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 4, 200]", 2.0), TrialJudgement(-87.39% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 2, 2, 50]", "nothing"), TrialJudgement(-5.06% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 3, 3, 100]", "nothing"), TrialJudgement(+2.74% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 2, 2, 50]", "nothing"), TrialJudgement(+3.78% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 3, 3, 100]", "nothing"), TrialJudgement(+5.92% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 2, 50]", "nothing"), TrialJudgement(-1.71% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 3, 100]", "nothing"), TrialJudgement(+1.51% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 2, 2, 50]", "nothing"), TrialJudgement(-1.55% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 3, 2, 100]", "nothing"), TrialJudgement(-0.11% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 2, 2, 50]", 0.5), TrialJudgement(+10.12% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 3, 2, 100]", 0.5), TrialJudgement(-2.37% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 2, 2, 100]", 0.5), TrialJudgement(-1.15% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 4, 4, 200]", 0.5), TrialJudgement(-1.30% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 2, 2, 100]", 0.5), TrialJudgement(-7.47% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 3, 4, 200]", 0.5), TrialJudgement(+2.63% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 2, 100]", 0.5), TrialJudgement(-1.32% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 4, 200]", 0.5), TrialJudgement(-2.61% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 2, 2, 50]", 0.5), TrialJudgement(-9.19% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 4, 4, 100]", 0.5), TrialJudgement(+2.24% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 2, 2, 50]", 0.5), TrialJudgement(-2.64% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 3, 4, 100]", 0.5), TrialJudgement(+2.56% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 2, 50]", 0.5), TrialJudgement(+5.36% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 4, 100]", 0.5), TrialJudgement(+0.42% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 2, 2, 50]", 0.5), TrialJudgement(+1.03% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 3, 2, 100]", 0.5), TrialJudgement(+3.10% => invariant))

For completeness I did a similar experiment, enabling only the transformer multithreading with 2 and 4 threads, without the limitation on the data sizes, to have an idea at what point things start helping. It seems like there is definitely things to be gained, but it would be interesting to experiment with distributing threads among the different drivers in real world settings to actually gauge the effect.

2 threads vs 1 thread
julia> group = "mpo";

julia> comparison = collect(judge(result_est_new2[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[160, 5, 3]", 2), TrialJudgement(+31.86% => regression))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[200, 20, 20]", 2), TrialJudgement(-1.91% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[2560, 5, 3]", 2), TrialJudgement(-13.55% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[40, 5, 3]", 2), TrialJudgement(+60.35% => regression))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 20, 20]", 2), TrialJudgement(-12.53% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 40, 40]", 2), TrialJudgement(-19.24% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6120, 5, 3]", 2), TrialJudgement(-13.69% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[640, 5, 3]", 2), TrialJudgement(+0.17% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[10, 4, 3]", "nothing"), TrialJudgement(+18.60% => regression))
 Pair{Any, Any}(("Float64", "Trivial", "[100, 10, 10]", "nothing"), TrialJudgement(-0.72% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[160, 4, 3]", "nothing"), TrialJudgement(-2.88% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[200, 10, 10]", "nothing"), TrialJudgement(+11.17% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[2560, 4, 3]", "nothing"), TrialJudgement(+0.06% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[300, 20, 20]", "nothing"), TrialJudgement(-1.11% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[40, 4, 3]", "nothing"), TrialJudgement(+7.79% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[640, 4, 3]", "nothing"), TrialJudgement(-0.92% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[160, 5, 3]", 0.5), TrialJudgement(+7.49% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[200, 20, 20]", 0.5), TrialJudgement(-8.89% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[2560, 5, 3]", 0.5), TrialJudgement(-1.87% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[40, 5, 3]", 0.5), TrialJudgement(+107.91% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 20, 20]", 0.5), TrialJudgement(-6.33% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 40, 40]", 0.5), TrialJudgement(-7.37% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6120, 5, 3]", 0.5), TrialJudgement(-0.76% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[640, 5, 3]", 0.5), TrialJudgement(-4.32% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[10, 4, 4]", 0.5), TrialJudgement(+249.25% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[100, 10, 10]", 0.5), TrialJudgement(-3.48% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[160, 4, 4]", 0.5), TrialJudgement(+6.42% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[200, 10, 10]", 0.5), TrialJudgement(-11.77% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[2560, 4, 4]", 0.5), TrialJudgement(-12.16% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[300, 20, 20]", 0.5), TrialJudgement(-22.22% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[40, 4, 4]", 0.5), TrialJudgement(+87.99% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[640, 4, 4]", 0.5), TrialJudgement(-10.93% => invariant))

julia> group = "mera";

julia> comparison = collect(judge(result_est_new2[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
24-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", 4, 2.0), TrialJudgement(-0.35% => invariant))
 Pair{Any, Any}(("Float64", "SU2Irrep", 8, 2.0), TrialJudgement(-34.05% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 12, 2.0), TrialJudgement(-39.23% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 16, 2.0), TrialJudgement(-46.35% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 22, 2.0), TrialJudgement(-35.32% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 28, 2.0), TrialJudgement(-37.79% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", 2, "nothing"), TrialJudgement(+6.86% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 3, "nothing"), TrialJudgement(+6.61% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 4, "nothing"), TrialJudgement(+13.17% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 8, "nothing"), TrialJudgement(+0.89% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 12, "nothing"), TrialJudgement(-2.70% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 16, "nothing"), TrialJudgement(-0.93% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 4, 0.5), TrialJudgement(+131.32% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", 8, 0.5), TrialJudgement(+18.74% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", 12, 0.5), TrialJudgement(-24.76% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 16, 0.5), TrialJudgement(-16.73% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 22, 0.5), TrialJudgement(-15.23% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 28, 0.5), TrialJudgement(-10.36% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 2, 0.5), TrialJudgement(+122.77% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", 4, 0.5), TrialJudgement(+53.27% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", 8, 0.5), TrialJudgement(+1.62% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 12, 0.5), TrialJudgement(-17.29% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", 16, 0.5), TrialJudgement(-21.12% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", 20, 0.5), TrialJudgement(-17.54% => improvement))

julia> group = "pepo";

julia> comparison = collect(judge(result_est_new2[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 2, 2, 50]", 2.0), TrialJudgement(-48.27% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 3, 2, 100]", 2.0), TrialJudgement(-41.85% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 2, 2, 100]", 2.0), TrialJudgement(-42.04% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 4, 4, 200]", 2.0), TrialJudgement(-42.23% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 2, 2, 100]", 2.0), TrialJudgement(-41.14% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 3, 4, 200]", 2.0), TrialJudgement(-39.20% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 2, 100]", 2.0), TrialJudgement(-42.91% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 4, 200]", 2.0), TrialJudgement(-37.54% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 2, 2, 50]", "nothing"), TrialJudgement(+6.42% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 3, 3, 100]", "nothing"), TrialJudgement(-7.24% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 2, 2, 50]", "nothing"), TrialJudgement(+0.42% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 3, 3, 100]", "nothing"), TrialJudgement(-5.51% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 2, 50]", "nothing"), TrialJudgement(-4.61% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 3, 100]", "nothing"), TrialJudgement(-5.79% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 2, 2, 50]", "nothing"), TrialJudgement(-1.53% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 3, 2, 100]", "nothing"), TrialJudgement(-0.97% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 2, 2, 50]", 0.5), TrialJudgement(-25.02% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 3, 2, 100]", 0.5), TrialJudgement(-14.86% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 2, 2, 100]", 0.5), TrialJudgement(-10.95% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 4, 4, 200]", 0.5), TrialJudgement(-3.78% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 2, 2, 100]", 0.5), TrialJudgement(-8.11% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 3, 4, 200]", 0.5), TrialJudgement(-16.78% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 2, 100]", 0.5), TrialJudgement(-13.30% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 4, 200]", 0.5), TrialJudgement(-10.33% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 2, 2, 50]", 0.5), TrialJudgement(-9.33% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 4, 4, 100]", 0.5), TrialJudgement(-20.25% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 2, 2, 50]", 0.5), TrialJudgement(-5.60% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 3, 4, 100]", 0.5), TrialJudgement(-21.14% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 2, 50]", 0.5), TrialJudgement(-23.69% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 4, 100]", 0.5), TrialJudgement(-20.99% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 2, 2, 50]", 0.5), TrialJudgement(-16.59% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 3, 2, 100]", 0.5), TrialJudgement(-22.42% => improvement))
4 threads vs 1 thread
julia> group = "mpo";

julia> comparison = collect(judge(result_est_new4[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[160, 5, 3]", 2), TrialJudgement(+19.33% => regression))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[200, 20, 20]", 2), TrialJudgement(-29.03% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[2560, 5, 3]", 2), TrialJudgement(-17.28% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[40, 5, 3]", 2), TrialJudgement(+53.86% => regression))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 20, 20]", 2), TrialJudgement(-30.31% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[400, 40, 40]", 2), TrialJudgement(-26.26% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6120, 5, 3]", 2), TrialJudgement(-17.71% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[640, 5, 3]", 2), TrialJudgement(-15.14% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", "[10, 4, 3]", "nothing"), TrialJudgement(+1.95% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[100, 10, 10]", "nothing"), TrialJudgement(-3.29% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[160, 4, 3]", "nothing"), TrialJudgement(-4.33% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[200, 10, 10]", "nothing"), TrialJudgement(-2.75% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[2560, 4, 3]", "nothing"), TrialJudgement(+0.27% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[300, 20, 20]", "nothing"), TrialJudgement(-0.13% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[40, 4, 3]", "nothing"), TrialJudgement(-10.46% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[640, 4, 3]", "nothing"), TrialJudgement(-1.99% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[160, 5, 3]", 0.5), TrialJudgement(+2.99% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[200, 20, 20]", 0.5), TrialJudgement(-12.24% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[2560, 5, 3]", 0.5), TrialJudgement(-1.93% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[40, 5, 3]", 0.5), TrialJudgement(+80.98% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 20, 20]", 0.5), TrialJudgement(-8.21% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[400, 40, 40]", 0.5), TrialJudgement(-4.58% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6120, 5, 3]", 0.5), TrialJudgement(+0.90% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[640, 5, 3]", 0.5), TrialJudgement(-5.74% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[10, 4, 4]", 0.5), TrialJudgement(+298.92% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[100, 10, 10]", 0.5), TrialJudgement(-14.76% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[160, 4, 4]", 0.5), TrialJudgement(-7.60% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[200, 10, 10]", 0.5), TrialJudgement(-19.96% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[2560, 4, 4]", 0.5), TrialJudgement(-11.07% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[300, 20, 20]", 0.5), TrialJudgement(-26.18% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[40, 4, 4]", 0.5), TrialJudgement(+90.16% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[640, 4, 4]", 0.5), TrialJudgement(-14.73% => invariant))

julia> group = "mera";

julia> comparison = collect(judge(result_est_new4[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
24-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", 4, 2.0), TrialJudgement(-49.64% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 8, 2.0), TrialJudgement(-64.71% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 12, 2.0), TrialJudgement(-56.83% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 16, 2.0), TrialJudgement(-67.05% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 22, 2.0), TrialJudgement(-54.39% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", 28, 2.0), TrialJudgement(-53.22% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", 2, "nothing"), TrialJudgement(+1.30% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 3, "nothing"), TrialJudgement(+14.43% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 4, "nothing"), TrialJudgement(+3.81% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 8, "nothing"), TrialJudgement(+2.96% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 12, "nothing"), TrialJudgement(-0.11% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", 16, "nothing"), TrialJudgement(+0.75% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", 4, 0.5), TrialJudgement(+171.77% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", 8, 0.5), TrialJudgement(+19.75% => regression))
 Pair{Any, Any}(("Float64", "U1Irrep", 12, 0.5), TrialJudgement(-23.47% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 16, 0.5), TrialJudgement(-21.67% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 22, 0.5), TrialJudgement(-18.58% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", 28, 0.5), TrialJudgement(-15.78% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", 2, 0.5), TrialJudgement(+178.65% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", 4, 0.5), TrialJudgement(+75.30% => regression))
 Pair{Any, Any}(("Float64", "Z2Irrep", 8, 0.5), TrialJudgement(-14.56% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", 12, 0.5), TrialJudgement(-23.38% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", 16, 0.5), TrialJudgement(-25.95% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", 20, 0.5), TrialJudgement(-24.06% => improvement))

julia> group = "pepo";

julia> comparison = collect(judge(result_est_new4[group], result_est_new[group]));

julia> comparison = sort!(comparison, by=x -> (first(x)[2], first(x)[3]))
32-element Vector{Any}:
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 2, 2, 50]", 2.0), TrialJudgement(-68.36% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[10, 3, 2, 100]", 2.0), TrialJudgement(-64.31% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 2, 2, 100]", 2.0), TrialJudgement(-62.56% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[4, 4, 4, 200]", 2.0), TrialJudgement(-61.43% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 2, 2, 100]", 2.0), TrialJudgement(-56.69% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[6, 3, 4, 200]", 2.0), TrialJudgement(-53.83% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 2, 100]", 2.0), TrialJudgement(-64.40% => improvement))
 Pair{Any, Any}(("Float64", "SU2Irrep", "[8, 2, 4, 200]", 2.0), TrialJudgement(-58.21% => improvement))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 2, 2, 50]", "nothing"), TrialJudgement(+8.14% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[3, 3, 3, 100]", "nothing"), TrialJudgement(-4.44% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 2, 2, 50]", "nothing"), TrialJudgement(+0.33% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[4, 3, 3, 100]", "nothing"), TrialJudgement(-1.24% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 2, 50]", "nothing"), TrialJudgement(-5.95% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[5, 2, 3, 100]", "nothing"), TrialJudgement(-1.74% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 2, 2, 50]", "nothing"), TrialJudgement(+1.60% => invariant))
 Pair{Any, Any}(("Float64", "Trivial", "[6, 3, 2, 100]", "nothing"), TrialJudgement(+0.09% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 2, 2, 50]", 0.5), TrialJudgement(-23.18% => improvement))
 Pair{Any, Any}(("Float64", "U1Irrep", "[10, 3, 2, 100]", 0.5), TrialJudgement(-12.33% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 2, 2, 100]", 0.5), TrialJudgement(-6.54% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[4, 4, 4, 200]", 0.5), TrialJudgement(-2.42% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 2, 2, 100]", 0.5), TrialJudgement(-8.01% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[6, 3, 4, 200]", 0.5), TrialJudgement(-13.01% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 2, 100]", 0.5), TrialJudgement(-9.83% => invariant))
 Pair{Any, Any}(("Float64", "U1Irrep", "[8, 2, 4, 200]", 0.5), TrialJudgement(-7.28% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 2, 2, 50]", 0.5), TrialJudgement(-13.39% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[4, 4, 4, 100]", 0.5), TrialJudgement(-26.58% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 2, 2, 50]", 0.5), TrialJudgement(-12.03% => invariant))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[5, 3, 4, 100]", 0.5), TrialJudgement(-30.31% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 2, 50]", 0.5), TrialJudgement(-29.04% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[6, 2, 4, 100]", 0.5), TrialJudgement(-27.33% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 2, 2, 50]", 0.5), TrialJudgement(-22.15% => improvement))
 Pair{Any, Any}(("Float64", "Z2Irrep", "[8, 3, 2, 100]", 0.5), TrialJudgement(-26.79% => improvement))

In summary though, I think it is safe to conclude that this is an overall improvement, and we can safely merge this.

As a fair warning again: I want to add the co-authored by tag for Olivier, so if you approve to merge tomorrow I'll gladly merge myself to make sure this is done.

@lkdvos lkdvos changed the title TreeTransformer refactor + multithreading [Performance] TreeTransformer refactor + multithreading Jul 8, 2025
@lkdvos
Copy link
Member Author

lkdvos commented Jul 8, 2025

As discussed before, I changed the last things considering the copyto! and copy! interaction with Strided. I'll let the tests finish to make sure I have everything right now, and then I'll merge.

@lkdvos lkdvos merged commit d68876b into master Jul 8, 2025
8 checks passed
@lkdvos lkdvos deleted the buffer-permute branch July 8, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants