Skip to content

Conversation

kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Aug 22, 2025

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is O(N^2), where N is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression.

Related issue: #150262

Copy link

github-actions bot commented Aug 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kasuga-fj kasuga-fj force-pushed the pipeliner-limit-num-stores branch from 17d1f31 to 67a4459 Compare August 22, 2025 13:17
@kasuga-fj
Copy link
Contributor Author

TODO: Add some comments and tests

@aankit-ca Could you please run your benchmark with this change? I believe this is the simplest solution for #150262. If you're okay with it, I'd like to proceed with this approach. Here are some notes:

  • I have confirmed that this patch resolves the specific case raised in hexagon compiler runs "forever" on matrix-spec-types at O2 #150262, but this change may reduce optimization opportunities for other cases.
  • The default value of SwpMaxNumStores is arbitrary. Please let me know if you have any preferences.
  • If we're very lucky, this might also improve compilation time in other cases, without causing any performance regressions.

Thanks in advance!

@aankit-ca
Copy link
Contributor

TODO: Add some comments and tests

@aankit-ca Could you please run your benchmark with this change? I believe this is the simplest solution for #150262. If you're okay with it, I'd like to proceed with this approach. Here are some notes:

  • I have confirmed that this patch resolves the specific case raised in hexagon compiler runs "forever" on matrix-spec-types at O2 #150262, but this change may reduce optimization opportunities for other cases.
  • The default value of SwpMaxNumStores is arbitrary. Please let me know if you have any preferences.
  • If we're very lucky, this might also improve compilation time in other cases, without causing any performance regressions.

Thanks in advance!

Thanks for looking into this issue @kasuga-fj . I'm on a vacation right now and will be back on Sep 2. I'll verify this once I'm back!

@kasuga-fj
Copy link
Contributor Author

Ah, thank you for reaching out during your time off. I’m fine with anytime if that works for you.

@kasuga-fj
Copy link
Contributor Author

kasuga-fj commented Sep 25, 2025

Gentle ping (sorry, I completely forgot this one)

@aankit-ca
Copy link
Contributor

@kasuga-fj I did some regressions with this patch. I'll try generating a reproducer for you?

@kasuga-fj
Copy link
Contributor Author

@aankit-ca Ah, I see. Thanks for checking. I don't need a reproducer for this case, I'll consider a different approach. Instead, if possible, could you share the number of load and store instructions separately for each regression case?

@aankit-ca
Copy link
Contributor

@aankit-ca Ah, I see. Thanks for checking. I don't need a reproducer for this case, I'll consider a different approach. Instead, if possible, could you share the number of load and store instructions separately for each regression case?

I'm re-running the tests to get the load-store numbers

@aankit-ca
Copy link
Contributor

aankit-ca commented Oct 8, 2025

@kasuga-fj The benchmark that showed the regression had only 2 stores in the innermost loop and your patch should not have caused the regression. I didn't even see the "Too many stores" in the debug logs. I don't want to block the merging for more time.

I feel the patch is good and the default store limit is pretty high already to not cause regressions for most practical usecases. Thanks for fixing the issue

@aankit-ca
Copy link
Contributor

Once the changes are merged, can you also cherry-pick the change on 21.x branch?

@kasuga-fj
Copy link
Contributor Author

Thanks for the checking!

Once the changes are merged, can you also cherry-pick the change on 21.x branch?

Yes, I'll do it.

@kasuga-fj kasuga-fj enabled auto-merge (squash) October 9, 2025 09:43
@kasuga-fj kasuga-fj merged commit 22b79fb into llvm:main Oct 9, 2025
9 checks passed
@kasuga-fj kasuga-fj deleted the pipeliner-limit-num-stores branch October 9, 2025 10:19
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
The dependency analysis in MachinePipeliner checks dependencies for
every pair of store instructions in the target basic block. This means
the time complexity of the analysis is `O(N^2)`, where `N` is the number
of store instructions. Therefore, compilation time can become
significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store
instructions at the beginning of the pipeliner and bail out if it
exceeds the threshold. The default value if the threshold should be
large enough. Thus, in most practical cases where the pipeliner is
beneficial, this patch should not cause any performance regression.

Related issue: #150262
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
The dependency analysis in MachinePipeliner checks dependencies for
every pair of store instructions in the target basic block. This means
the time complexity of the analysis is `O(N^2)`, where `N` is the number
of store instructions. Therefore, compilation time can become
significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store
instructions at the beginning of the pipeliner and bail out if it
exceeds the threshold. The default value if the threshold should be
large enough. Thus, in most practical cases where the pipeliner is
beneficial, this patch should not cause any performance regression.

Related issue: llvm#150262
@aankit-ca aankit-ca added this to the LLVM 21.x Release milestone Oct 10, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 10, 2025
@aankit-ca
Copy link
Contributor

/cherry-pick 22b79fb

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2025

Failed to cherry-pick: 22b79fb

https://github.com/llvm/llvm-project/actions/runs/18419192219

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

@kasuga-fj
Copy link
Contributor Author

Ah, I already did that. #162639

@aankit-ca
Copy link
Contributor

Oh cool. Thanks!

@c-rhodes c-rhodes moved this from Needs Triage to Done in LLVM Release Status Oct 13, 2025
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
The dependency analysis in MachinePipeliner checks dependencies for
every pair of store instructions in the target basic block. This means
the time complexity of the analysis is `O(N^2)`, where `N` is the number
of store instructions. Therefore, compilation time can become
significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store
instructions at the beginning of the pipeliner and bail out if it
exceeds the threshold. The default value if the threshold should be
large enough. Thus, in most practical cases where the pipeliner is
beneficial, this patch should not cause any performance regression.

Related issue: llvm#150262
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
The dependency analysis in MachinePipeliner checks dependencies for
every pair of store instructions in the target basic block. This means
the time complexity of the analysis is `O(N^2)`, where `N` is the number
of store instructions. Therefore, compilation time can become
significantly long when there are too many store instructions.

To mitigate it, this patch introduces logic to count the number of store
instructions at the beginning of the pipeliner and bail out if it
exceeds the threshold. The default value if the threshold should be
large enough. Thus, in most practical cases where the pipeliner is
beneficial, this patch should not cause any performance regression.

Related issue: llvm#150262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants