-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[SROA] Refactor rewritePartition alloca type selection process #167771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SROA] Refactor rewritePartition alloca type selection process #167771
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Yonah Goldberg (YonahGoldberg) ChangesWhen an aggregate alloca has mixed scalar type uses (e.g., store i32 and load float), SROA needs to find a common type for the partition.
The problem occurs in step 2: For example, given: Previously, SROA would:
This PR modifies step 3 to also trigger when After this change, the example is optimized to: The alloca is completely eliminated. I read through the discussion on the commit that implemented some of this logic and I'm wondering if there's a larger refactoring that should happen. For example this comment: > Agreed. But until LLVM removes pointer sub-types it's convenient to get the alloca type right to avoid bitcast on every access anyway. Is some of the code that exists here not useful anymore now that we have opaque pointers? I guess it doesn't matter as much what the type of the alloca is. Full diff: https://github.com/llvm/llvm-project/pull/167771.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 5c60fad6f91aa..7905cfe95336d 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -5234,7 +5234,9 @@ AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,
SliceTy = TypePartitionTy;
// If still not, can we use the largest bitwidth integer type used?
- if (!SliceTy && CommonUseTy.second)
+ // If SliceTy is a non-promotable aggregate, prefer to represent as an integer type
+ // because it's more likely to be promotable.
+ if ((!SliceTy || !SliceTy->isSingleValueType()) && CommonUseTy.second)
if (DL.getTypeAllocSize(CommonUseTy.second).getFixedValue() >= P.size()) {
SliceTy = CommonUseTy.second;
SliceVecTy = dyn_cast<VectorType>(SliceTy);
diff --git a/llvm/test/Transforms/SROA/prefer-integer-partition.ll b/llvm/test/Transforms/SROA/prefer-integer-partition.ll
new file mode 100644
index 0000000000000..3606af8debd69
--- /dev/null
+++ b/llvm/test/Transforms/SROA/prefer-integer-partition.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=sroa -S | FileCheck %s
+
+; Ensure that the [2 x half] alloca is spanned by an i32 partition.
+
+define void @test() {
+; CHECK-LABEL: @test(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32 42 to float
+; CHECK-NEXT: ret void
+;
+entry:
+ %alloca = alloca [2 x half]
+ store i32 42, ptr %alloca
+ %val = load float, ptr %alloca
+ ret void
+}
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
This is related to issue: #164308, but it doesn't seem like it fixes it, so I will investigate this one as well. |
|
Updated the PR to fix the Julia issue as well. |
|
FYI I am looking into a larger refactor to simplify the type selection process in |
79388aa to
ee7fb53
Compare
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
|
Both of the arm changes are due to the early exit I added if isVectorPromotionViable returns a vector of floats. The neon testcase has the following pattern: alloca i64 Originally we prefer to promote via integer widening, now we prefer to promote via vector promotion. I think this is a win because if you're loading as a float type its probably being treated as a float. The bf16 testcase has a alloca <1 x double> that we now promote via vector promotion instead of integer widening, which causes the small diff. Again, I think this is fine. |
This change reverts 2572512, which landed on Aug 8, 2022. This change addressed the problem that if you have IR that looks something like: ``` %alloca = alloca <4 x float> store <4 x float> %data, ptr %alloca %load = load half, ptr %alloca ``` `getCommonType` would return `<4 x float>` because the `load half` isn't to the entire partition, so we skip the first `getTypePartition` check. 2572512 added a later check that sees that `<4 x float>` is not vector promotable because of the `load half`, and then calls `getTypePartition`, which changes the `sliceTy` to `< 8 x half>`, which is vector promotable because the store can be changed to `store <8 x half>`. So we set the `sliceTy` to `<8 x half>`, we can promote the alloca, and everyone is happy. This code became unnecessary after 529eafd, which landed ~3 months later, which fixes the issue in a different way. `isVectorPromotionViable` was already smart enough to try `<8 x half>` as a type candidate because it sees the `load half`. However, this candidate didn't work because it conflicts with `store <4 x float>`. This commit added logic to try integer-ifying candidates if there is no common type. So the `<8 x half>` candidate gets converted to `<8 x i16>`, which works because we can convert the alloca to `alloca <8 x i16>` and the load to `load i16`, allowing promotion. After 529eafd, the original commit is pointless. It tries to refine the `SliceTy`, but if `isVectorPromotionViable` succeeds, it returns a new type to use and we will ignore the `SliceTy`. This change is my first patch to try to simplify the type selection process in rewritePartition. I had some other ideas that I tried in #167771 and #168796, but they need refinement.
…ss (#169106) This change reverts llvm/llvm-project@2572512, which landed on Aug 8, 2022. This change addressed the problem that if you have IR that looks something like: ``` %alloca = alloca <4 x float> store <4 x float> %data, ptr %alloca %load = load half, ptr %alloca ``` `getCommonType` would return `<4 x float>` because the `load half` isn't to the entire partition, so we skip the first `getTypePartition` check. llvm/llvm-project@2572512 added a later check that sees that `<4 x float>` is not vector promotable because of the `load half`, and then calls `getTypePartition`, which changes the `sliceTy` to `< 8 x half>`, which is vector promotable because the store can be changed to `store <8 x half>`. So we set the `sliceTy` to `<8 x half>`, we can promote the alloca, and everyone is happy. This code became unnecessary after llvm/llvm-project@529eafd, which landed ~3 months later, which fixes the issue in a different way. `isVectorPromotionViable` was already smart enough to try `<8 x half>` as a type candidate because it sees the `load half`. However, this candidate didn't work because it conflicts with `store <4 x float>`. This commit added logic to try integer-ifying candidates if there is no common type. So the `<8 x half>` candidate gets converted to `<8 x i16>`, which works because we can convert the alloca to `alloca <8 x i16>` and the load to `load i16`, allowing promotion. After llvm/llvm-project@529eafd, the original commit is pointless. It tries to refine the `SliceTy`, but if `isVectorPromotionViable` succeeds, it returns a new type to use and we will ignore the `SliceTy`. This change is my first patch to try to simplify the type selection process in rewritePartition. I had some other ideas that I tried in llvm/llvm-project#167771 and llvm/llvm-project#168796, but they need refinement.
When an aggregate alloca has mixed scalar type uses (e.g., store i32 and load float), SROA needs to find a common type for the partition.
SROA's type selection logic works as follows:
getTypePartitionThe problem occurs in step 2:
getTypePartitioncan return an aggregate type (like[2 x half]) that spans the partition, but aggregate types are not promotable to SSA values (they are not single-value types). This prevents SROA from eliminating the alloca.For example, given:
Previously, SROA would:
i32andfloatgetTypePartitionwhich returns[2 x half]alloca [2 x half]This PR modifies step 3 to also trigger when
getTypePartitionreturns a non-promotable aggregate (checked via!isSingleValueType()). This causes SROA to prefer the integer type (i32) over the aggregate type, allowing the alloca to be fully promoted and eliminated.After this change, the example is optimized to:
The alloca is completely eliminated.
This PR also allows
[2 x float]allocas to get treated as ai64partition, which allows promotion. I guess the point of not allowing this before was that you'd have to have an extra bitcast potentially to float, but with opaque pointers this isn't true anymore. This resolves #164308.I read through the discussion on the commit that implemented some of this logic and I'm wondering if there's a larger refactoring that should happen. For example this comment:
Is some of the code that exists here not useful anymore now that we have opaque pointers? I guess it doesn't matter as much what the type of the alloca is.