-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MachinePipeliner] Limit the number of stores in BB #154940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
17d1f31
to
67a4459
Compare
TODO: Add some comments and tests @aankit-ca Could you please run your benchmark with this change? I believe this is the simplest solution for #150262. If you're okay with it, I'd like to proceed with this approach. Here are some notes:
Thanks in advance! |
Thanks for looking into this issue @kasuga-fj . I'm on a vacation right now and will be back on Sep 2. I'll verify this once I'm back! |
Ah, thank you for reaching out during your time off. I’m fine with anytime if that works for you. |
Gentle ping (sorry, I completely forgot this one) |
@kasuga-fj I did some regressions with this patch. I'll try generating a reproducer for you? |
@aankit-ca Ah, I see. Thanks for checking. I don't need a reproducer for this case, I'll consider a different approach. Instead, if possible, could you share the number of load and store instructions separately for each regression case? |
I'm re-running the tests to get the load-store numbers |
@kasuga-fj The benchmark that showed the regression had only 2 stores in the innermost loop and your patch should not have caused the regression. I didn't even see the "Too many stores" in the debug logs. I don't want to block the merging for more time. I feel the patch is good and the default store limit is pretty high already to not cause regressions for most practical usecases. Thanks for fixing the issue |
Once the changes are merged, can you also cherry-pick the change on 21.x branch? |
Thanks for the checking!
Yes, I'll do it. |
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: #150262
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262
/cherry-pick 22b79fb |
Failed to cherry-pick: 22b79fb https://github.com/llvm/llvm-project/actions/runs/18419192219 Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request |
Ah, I already did that. #162639 |
Oh cool. Thanks! |
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is
O(N^2)
, whereN
is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions.To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression.
Related issue: #150262