[MachinePipeliner] Limit the number of stores in BB #154940

kasuga-fj · 2025-08-22T13:07:26Z

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is O(N^2), where N is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression.

Related issue: #150262

github-actions · 2025-08-22T13:09:55Z

✅ With the latest revision this PR passed the C/C++ code formatter.

kasuga-fj · 2025-08-22T13:24:04Z

TODO: Add some comments and tests

@aankit-ca Could you please run your benchmark with this change? I believe this is the simplest solution for #150262. If you're okay with it, I'd like to proceed with this approach. Here are some notes:

I have confirmed that this patch resolves the specific case raised in hexagon compiler runs "forever" on matrix-spec-types at O2 #150262, but this change may reduce optimization opportunities for other cases.
The default value of SwpMaxNumStores is arbitrary. Please let me know if you have any preferences.
If we're very lucky, this might also improve compilation time in other cases, without causing any performance regressions.

Thanks in advance!

aankit-ca · 2025-08-22T17:23:51Z

TODO: Add some comments and tests

@aankit-ca Could you please run your benchmark with this change? I believe this is the simplest solution for #150262. If you're okay with it, I'd like to proceed with this approach. Here are some notes:

I have confirmed that this patch resolves the specific case raised in hexagon compiler runs "forever" on matrix-spec-types at O2 #150262, but this change may reduce optimization opportunities for other cases.

The default value of SwpMaxNumStores is arbitrary. Please let me know if you have any preferences.

If we're very lucky, this might also improve compilation time in other cases, without causing any performance regressions.

Thanks in advance!

Thanks for looking into this issue @kasuga-fj . I'm on a vacation right now and will be back on Sep 2. I'll verify this once I'm back!

kasuga-fj · 2025-08-22T17:26:50Z

Ah, thank you for reaching out during your time off. I’m fine with anytime if that works for you.

kasuga-fj · 2025-09-25T06:27:50Z

Gentle ping (sorry, I completely forgot this one)

aankit-ca · 2025-09-29T17:19:15Z

@kasuga-fj I did some regressions with this patch. I'll try generating a reproducer for you?

kasuga-fj · 2025-09-30T07:59:07Z

@aankit-ca Ah, I see. Thanks for checking. I don't need a reproducer for this case, I'll consider a different approach. Instead, if possible, could you share the number of load and store instructions separately for each regression case?

aankit-ca · 2025-10-07T18:40:35Z

@aankit-ca Ah, I see. Thanks for checking. I don't need a reproducer for this case, I'll consider a different approach. Instead, if possible, could you share the number of load and store instructions separately for each regression case?

I'm re-running the tests to get the load-store numbers

aankit-ca · 2025-10-08T20:40:12Z

@kasuga-fj The benchmark that showed the regression had only 2 stores in the innermost loop and your patch should not have caused the regression. I didn't even see the "Too many stores" in the debug logs. I don't want to block the merging for more time.

I feel the patch is good and the default store limit is pretty high already to not cause regressions for most practical usecases. Thanks for fixing the issue

aankit-ca · 2025-10-08T20:48:04Z

Once the changes are merged, can you also cherry-pick the change on 21.x branch?

kasuga-fj · 2025-10-09T09:43:22Z

Thanks for the checking!

Once the changes are merged, can you also cherry-pick the change on 21.x branch?

Yes, I'll do it.

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: #150262

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262

aankit-ca · 2025-10-10T21:47:37Z

/cherry-pick 22b79fb

llvmbot · 2025-10-10T21:53:55Z

Failed to cherry-pick: 22b79fb

https://github.com/llvm/llvm-project/actions/runs/18419192219

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

kasuga-fj · 2025-10-10T21:59:54Z

Ah, I already did that. #162639

aankit-ca · 2025-10-10T22:08:47Z

Oh cool. Thanks!

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262

llvmbot added the llvm:codegen label Aug 22, 2025

[MachinePipeliner] Limit the number of stores in BB

67a4459

kasuga-fj force-pushed the pipeliner-limit-num-stores branch from 17d1f31 to 67a4459 Compare August 22, 2025 13:17

kasuga-fj mentioned this pull request Oct 7, 2025

hexagon compiler runs "forever" on matrix-spec-types at O2 #150262

Closed

aankit-ca approved these changes Oct 8, 2025

View reviewed changes

Merge branch 'main' into pipeliner-limit-num-stores

ba08f45

kasuga-fj enabled auto-merge (squash) October 9, 2025 09:43

kasuga-fj merged commit 22b79fb into llvm:main Oct 9, 2025
9 checks passed

kasuga-fj deleted the pipeliner-limit-num-stores branch October 9, 2025 10:19

aankit-ca added this to the LLVM 21.x Release milestone Oct 10, 2025

github-project-automation bot added this to LLVM Release Status Oct 10, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 10, 2025

llvmbot added the release:cherry-pick-failed label Oct 10, 2025

c-rhodes moved this from Needs Triage to Done in LLVM Release Status Oct 13, 2025

kasuga-fj mentioned this pull request Oct 14, 2025

[MachinePipeliner] Add test missed in #154940 (NFC) #163350

Open

[MachinePipeliner] Limit the number of stores in BB #154940

[MachinePipeliner] Limit the number of stores in BB #154940

Uh oh!

Conversation

kasuga-fj commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Aug 22, 2025

Uh oh!

aankit-ca commented Aug 22, 2025

Uh oh!

kasuga-fj commented Aug 22, 2025

Uh oh!

kasuga-fj commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aankit-ca commented Sep 29, 2025

Uh oh!

kasuga-fj commented Sep 30, 2025

Uh oh!

aankit-ca commented Oct 7, 2025

Uh oh!

aankit-ca commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aankit-ca commented Oct 8, 2025

Uh oh!

kasuga-fj commented Oct 9, 2025

Uh oh!

Uh oh!

aankit-ca commented Oct 10, 2025

Uh oh!

llvmbot commented Oct 10, 2025

Uh oh!

kasuga-fj commented Oct 10, 2025

Uh oh!

aankit-ca commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kasuga-fj commented Aug 22, 2025 •

edited

Loading

github-actions bot commented Aug 22, 2025 •

edited

Loading

kasuga-fj commented Sep 25, 2025 •

edited

Loading

aankit-ca commented Oct 8, 2025 •

edited

Loading