[Perf] EnlargedCorner optimizations by lkdvos · Pull Request #214 · QuantumKitHub/PEPSKit.jl

lkdvos · 2025-06-09T20:47:55Z

Fixes #213 .

This manually fixes the contraction order for the enlarged corners. While it would be great to avoid having to manually check the optimal orders every time, at least for now it seems reasonable to manually fix some of them.

In particular, the @autoopt currently has absolutely no way of taking into account that some contraction orders that have equal cost might have different subleading costs due to the permutations, which I'm not entirely sure how to fix.

Additionally this fixes something that has been bothering me for a while: the enlarged corners now actually keep track of which one they are, so the TensorMap(Q::EnlargedCorner) no longer needs to get an additional argument.

Performance-wise, I ended up trying a bunch of orders and ended up with the results that contracting first the edges, then the bra and then the ket ends up on top consistently. With the spaces for SU(2) as linked in the parent issue I end up with:

# original:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 16.443 s (0.05% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 95320 allocations.

# ket then bra:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.969 s (0.11% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 94124 allocations.

# bra then ket:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.612 s (0.09% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 88989 allocations.

I did verify that these results are consistent for the other sizes as well.

codecov · 2025-06-09T20:57:30Z

Codecov Report

Attention: Patch coverage is 42.85714% with 16 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/algorithms/contractions/ctmrg_contractions.jl	20.00%	16 Missing ⚠️

Files with missing lines	Coverage Δ
src/algorithms/ctmrg/sequential.jl	`98.36% <100.00%> (ø)`
src/algorithms/ctmrg/simultaneous.jl	`98.27% <100.00%> (ø)`
src/algorithms/ctmrg/sparse_environments.jl	`30.76% <100.00%> (ø)`
src/algorithms/contractions/ctmrg_contractions.jl	`56.10% <20.00%> (-2.00%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lkdvos · 2025-06-10T01:35:53Z

Update, it seems like the intermediate permutations actually make a huge difference, I found a different order (technically due to @ogauthe) that is another factor 2 faster. I'll look into how this can be implemented, but also how this could be automated in the future.

# PR state now
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.661 s (0.12% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 88989 allocations.

# PR state with updated intermediate permutations
BenchmarkTools.Trial: 2 samples with 1 evaluation per sample.
 Range (min … max):  2.990 s …   3.086 s  ┊ GC (min … max): 1.56% … 4.75%
 Time  (median):     3.038 s              ┊ GC (median):    3.18%
 Time  (mean ± σ):   3.038 s ± 68.183 ms  ┊ GC (mean ± σ):  3.18% ± 2.25%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.99 s         Histogram: frequency by time        3.09 s <

 Memory estimate: 6.49 GiB, allocs estimate: 21237.

pbrehmer

Thanks a lot for taking care of this! I really wasn't aware of the big difference the intermediate permutations make. Let me know when I should review again or if I can help anywhere.

lkdvos · 2025-06-10T12:28:56Z

This seems to only matter for non-abelian tensors however, so there is still some interesting interplay going on here. I'll keep you posted

ogauthe · 2025-06-10T14:05:20Z

I'm glad you found out the explanation.

I expect this effect to matter for all tensors, but indeed it will be more important for non-abelian. On my benchmark, frostspin was slightly faster in the Trivial case. With SU(2), reaching the asymptotic behavior where contraction dominates requires very high bond dimensions.

lkdvos · 2025-06-12T21:25:02Z

I have now updated all the PEPS corner contractions appropriately.
On my machine, I get the following updated timings for contracting the corners (quite impressive actually):

| sym/(dir, d, D, chi)  | master            | dirty             | master / dirty |
|:----------------------|:-----------------:|:-----------------:|:--------------:|
| su2/(1, 4, 11, 121)   | 0.638 ± 0.059 s   | 0.174 ± 0.035 s   | 3.67 ± 0.81    |
| su2/(1, 4, 16, 255)   | 16.8 s            | 3.02 ± 0.013 s    | 5.56           |
| su2/(1, 4, 4, 16)     | 4.52 ± 0.05 ms    | 2.61 ± 0.035 ms   | 1.73 ± 0.03    |
| su2/(1, 4, 7, 49)     | 19.5 ± 1.8 ms     | 7.97 ± 0.79 ms    | 2.44 ± 0.33    |
| su2/(2, 4, 11, 121)   | 0.555 ± 0.028 s   | 0.173 ± 0.0092 s  | 3.21 ± 0.24    |
| su2/(2, 4, 16, 255)   | 16.4 s            | 2.97 ± 0.1 s      | 5.51           |
| su2/(2, 4, 4, 16)     | 4.53 ± 0.037 ms   | 2.62 ± 0.054 ms   | 1.73 ± 0.038   |
| su2/(2, 4, 7, 49)     | 17.5 ± 0.88 ms    | 7.88 ± 0.98 ms    | 2.22 ± 0.3     |
| su2/(3, 4, 11, 121)   | 0.528 ± 0.0044 s  | 0.174 ± 0.0092 s  | 3.04 ± 0.16    |
| su2/(3, 4, 16, 255)   | 15.9 s            | 2.9 ± 0.15 s      | 5.48           |
| su2/(3, 4, 4, 16)     | 4.58 ± 0.15 ms    | 2.55 ± 0.024 ms   | 1.8 ± 0.061    |
| su2/(3, 4, 7, 49)     | 17.4 ± 0.97 ms    | 7.67 ± 0.68 ms    | 2.27 ± 0.24    |
| su2/(4, 4, 11, 121)   | 0.539 ± 0.021 s   | 0.177 ± 0.021 s   | 3.04 ± 0.38    |
| su2/(4, 4, 16, 255)   | 15.6 s            | 2.93 ± 0.094 s    | 5.32           |
| su2/(4, 4, 4, 16)     | 4.52 ± 0.025 ms   | 2.59 ± 0.024 ms   | 1.74 ± 0.019   |
| su2/(4, 4, 7, 49)     | 17.5 ± 1.3 ms     | 7.89 ± 1.3 ms     | 2.22 ± 0.4     |
| trivial/(1, 4, 4, 16) | 2.02 ± 0.28 ms    | 1.58 ± 0.27 ms    | 1.28 ± 0.28    |
| trivial/(1, 4, 5, 25) | 12.3 ± 3.1 ms     | 9.53 ± 2.2 ms     | 1.29 ± 0.44    |
| trivial/(1, 4, 6, 36) | 0.0579 ± 0.0027 s | 0.0382 ± 0.0024 s | 1.52 ± 0.12    |
| trivial/(1, 4, 7, 49) | 0.243 ± 0.012 s   | 0.143 ± 0.015 s   | 1.7 ± 0.2      |
| trivial/(1, 4, 8, 64) | 0.892 ± 0.1 s     | 0.593 ± 0.089 s   | 1.5 ± 0.29     |
| trivial/(2, 4, 4, 16) | 1.85 ± 0.45 ms    | 1.6 ± 0.13 ms     | 1.16 ± 0.3     |
| trivial/(2, 4, 5, 25) | 12.6 ± 3.9 ms     | 9.75 ± 2.3 ms     | 1.29 ± 0.51    |
| trivial/(2, 4, 6, 36) | 0.0572 ± 0.0012 s | 0.0408 ± 0.0031 s | 1.4 ± 0.11     |
| trivial/(2, 4, 7, 49) | 0.245 ± 0.0069 s  | 0.147 ± 0.01 s    | 1.67 ± 0.12    |
| trivial/(2, 4, 8, 64) | 0.87 ± 0.11 s     | 0.603 ± 0.091 s   | 1.44 ± 0.29    |
| trivial/(3, 4, 4, 16) | 1.52 ± 0.32 ms    | 1.6 ± 0.36 ms     | 0.948 ± 0.29   |
| trivial/(3, 4, 5, 25) | 11.7 ± 2.9 ms     | 9.84 ± 2.3 ms     | 1.19 ± 0.4     |
| trivial/(3, 4, 6, 36) | 0.0548 ± 0.003 s  | 0.0386 ± 0.004 s  | 1.42 ± 0.17    |
| trivial/(3, 4, 7, 49) | 0.247 ± 0.016 s   | 0.148 ± 0.013 s   | 1.67 ± 0.18    |
| trivial/(3, 4, 8, 64) | 0.884 ± 0.1 s     | 0.622 ± 0.09 s    | 1.42 ± 0.26    |
| trivial/(4, 4, 4, 16) | 1.96 ± 0.37 ms    | 1.61 ± 0.3 ms     | 1.22 ± 0.32    |
| trivial/(4, 4, 5, 25) | 12 ± 3.3 ms       | 9.58 ± 2.3 ms     | 1.26 ± 0.46    |
| trivial/(4, 4, 6, 36) | 0.0567 ± 0.0017 s | 0.0406 ± 0.0048 s | 1.4 ± 0.17     |
| trivial/(4, 4, 7, 49) | 0.244 ± 0.0054 s  | 0.147 ± 0.0089 s  | 1.66 ± 0.11    |
| trivial/(4, 4, 8, 64) | 0.898 ± 0.14 s    | 0.565 ± 0.11 s    | 1.59 ± 0.38    |
| u1/(1, 4, 11, 121)    | 1.99 ± 0.14 s     | 1.32 ± 0.082 s    | 1.51 ± 0.14    |
| u1/(1, 4, 4, 16)      | 0.921 ± 0.084 ms  | 0.854 ± 0.033 ms  | 1.08 ± 0.11    |
| u1/(1, 4, 7, 49)      | 0.0351 ± 0.0013 s | 24.2 ± 1.3 ms     | 1.45 ± 0.095   |
| u1/(2, 4, 11, 121)    | 1.83 ± 0.11 s     | 1.34 ± 0.091 s    | 1.37 ± 0.13    |
| u1/(2, 4, 4, 16)      | 0.923 ± 0.059 ms  | 0.86 ± 0.042 ms   | 1.07 ± 0.087   |
| u1/(2, 4, 7, 49)      | 0.0356 ± 0.0033 s | 23.9 ± 1.6 ms     | 1.49 ± 0.17    |
| u1/(3, 4, 11, 121)    | 1.92 ± 0.14 s     | 1.29 ± 0.13 s     | 1.49 ± 0.19    |
| u1/(3, 4, 4, 16)      | 0.925 ± 0.063 ms  | 0.866 ± 0.046 ms  | 1.07 ± 0.092   |
| u1/(3, 4, 7, 49)      | 0.0348 ± 0.001 s  | 24.6 ± 2.1 ms     | 1.42 ± 0.13    |
| u1/(4, 4, 11, 121)    | 1.87 ± 0.14 s     | 1.27 ± 0.013 s    | 1.47 ± 0.11    |
| u1/(4, 4, 4, 16)      | 0.92 ± 0.1 ms     | 0.86 ± 0.044 ms   | 1.07 ± 0.13    |
| u1/(4, 4, 7, 49)      | 0.0358 ± 0.0028 s | 24.1 ± 1.1 ms     | 1.49 ± 0.13    |
| time_to_load          | 1.44 ± 0.086 s    | 1.47 ± 0.039 s    | 0.979 ± 0.064  |

pbrehmer

Really impressive improvement! Thanks also for including the benchmark study - that might become useful in the future in case we want to set up a proper benchmark suite for PEPSKit.

* Store `dir` in `EnlargedCorner` * Manually fix enlarged corner contractions * Update contractions --- Co-authored-by: Olivier Gauthe <olivier.gauthe.2011+github@polytechnique.org>

lkdvos added 2 commits June 9, 2025 15:53

Store dir in EnlargedCorner

3bbd078

Manually fix enlarged corner contractions

9d4f2e6

lkdvos force-pushed the performance branch from 06afc7d to 9d4f2e6 Compare June 9, 2025 20:50

lkdvos requested a review from pbrehmer June 9, 2025 21:15

lkdvos enabled auto-merge (squash) June 9, 2025 21:15

lkdvos marked this pull request as draft June 10, 2025 01:34

auto-merge was automatically disabled June 10, 2025 01:34
Pull request was converted to draft

pbrehmer reviewed Jun 10, 2025

View reviewed changes

Update contractions

797d224

lkdvos marked this pull request as ready for review June 12, 2025 21:22

lkdvos requested a review from pbrehmer June 12, 2025 21:26

lkdvos enabled auto-merge (squash) June 12, 2025 21:27

lkdvos and others added 2 commits June 12, 2025 17:27

Merge branch 'master' into performance

382d85a

Merge branch 'master' into performance

d9486d7

pbrehmer approved these changes Jun 13, 2025

View reviewed changes

lkdvos merged commit 5a28693 into master Jun 13, 2025
44 of 45 checks passed

lkdvos deleted the performance branch June 13, 2025 11:36

This was referenced Jun 13, 2025

Projectors performance improvements #220

Closed

Expectation value performance improvements #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] EnlargedCorner optimizations#214

[Perf] EnlargedCorner optimizations#214
lkdvos merged 5 commits intomasterfrom
performance

lkdvos commented Jun 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 9, 2025 •

edited

Loading

Uh oh!

lkdvos commented Jun 10, 2025 •

edited

Loading

Uh oh!

pbrehmer left a comment

Uh oh!

lkdvos commented Jun 10, 2025

Uh oh!

ogauthe commented Jun 10, 2025

Uh oh!

lkdvos commented Jun 12, 2025

Uh oh!

pbrehmer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lkdvos commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lkdvos commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbrehmer left a comment

Choose a reason for hiding this comment

Uh oh!

lkdvos commented Jun 10, 2025

Uh oh!

ogauthe commented Jun 10, 2025

Uh oh!

lkdvos commented Jun 12, 2025

Uh oh!

pbrehmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lkdvos commented Jun 9, 2025 •

edited

Loading

codecov bot commented Jun 9, 2025 •

edited

Loading

lkdvos commented Jun 10, 2025 •

edited

Loading