Skip to content

Conversation

@Chengjunp
Copy link
Contributor

@Chengjunp Chengjunp commented Oct 10, 2025

The change fixes a bug in the SROA where tree-structured merge optimization was incorrectly applied when the size of the stored value was not a multiple of the new allocated element type size. The original change is #152793. A simple repro would be

define <1 x i32> @foo(<1 x i16> %a, <1 x i16> %b) {
entry:
  %alloca = alloca [1 x i32]

  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
  store <1 x i16> %a, ptr %ptr0

  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
  store <1 x i16> %b, ptr %ptr1

  %result = load <1 x i32>, ptr %alloca
  ret <1 x i32> %result
}

Currently, this will lead to a compile time crash.

In this change, we will skip the tree-structured merge for this case and fall back to normal SROA.

@Chengjunp Chengjunp requested a review from Prince781 October 10, 2025 21:12
@Chengjunp Chengjunp self-assigned this Oct 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Chengjun (Chengjunp)

Changes

The change fixes a bug in the SROA where tree-structured merge optimization was incorrectly applied when the size of the stored value was not a multiple of the new allocated element type size. A simple repro would be

define &lt;1 x i32&gt; @<!-- -->foo(&lt;1 x i16&gt; %a, &lt;1 x i16&gt; %b) {
entry:
  %alloca = alloca [1 x i32]

  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
  store &lt;1 x i16&gt; %a, ptr %ptr0

  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
  store &lt;1 x i16&gt; %b, ptr %ptr1

  %result = load &lt;1 x i32&gt;, ptr %alloca
  ret &lt;1 x i32&gt; %result
}

Currently, this will lead to a compile time crash.

In this change, we will skip the tree-structured merge for this case and fall back to normal SROA.


Full diff: https://github.com/llvm/llvm-project/pull/162921.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+8)
  • (modified) llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll (+14)
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 45d3d493a9e68..dc24adab42a3b 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -2961,6 +2961,7 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
         isa<FixedVectorType>(NewAI.getAllocatedType())
             ? cast<FixedVectorType>(NewAI.getAllocatedType())->getElementType()
             : Type::getInt8Ty(NewAI.getContext());
+    unsigned AllocatedEltTySize = DL.getTypeSizeInBits(AllocatedEltTy);
 
     // Helper to check if a type is
     //  1. A fixed vector type
@@ -2991,9 +2992,16 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
         // Do not handle the case if
         //   1. The store does not meet the conditions in the helper function
         //   2. The store is volatile
+        //   3. The store value type size is less than the allocated element
+        //   type size
         if (!IsTypeValidForTreeStructuredMerge(
                 SI->getValueOperand()->getType()) ||
             SI->isVolatile())
+        return std::nullopt;
+        auto *VecTy = cast<FixedVectorType>(SI->getValueOperand()->getType());
+        unsigned NumElts = VecTy->getNumElements();
+        unsigned EltSize = DL.getTypeSizeInBits(VecTy->getElementType());
+        if (NumElts * EltSize % AllocatedEltTySize != 0)
           return std::nullopt;
         StoreInfos.emplace_back(SI, S.beginOffset(), S.endOffset(),
                                 SI->getValueOperand());
diff --git a/llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll b/llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
index c858d071451e8..ead6e027ed37c 100644
--- a/llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
+++ b/llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
@@ -219,4 +219,18 @@ entry:
 
 }
 
+define <1 x i32> @test_store_value_size_not_multiple_of_allocated_element_type_size(<1 x i16> %a, <1 x i16> %b) {
+entry:
+  %alloca = alloca [2 x i16]
+
+  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
+  store <1 x i16> %a, ptr %ptr0
+
+  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
+  store <1 x i16> %b, ptr %ptr1
+
+  %result = load <1 x i32>, ptr %alloca
+  ret <1 x i32> %result
+}
+
 declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)

@github-actions
Copy link

github-actions bot commented Oct 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Chengjunp Chengjunp requested a review from mjulian31 October 10, 2025 21:16
Copy link
Contributor

@mjulian31 mjulian31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Chengjunp Chengjunp enabled auto-merge (squash) October 10, 2025 23:53
@Chengjunp Chengjunp merged commit 8faeed0 into llvm:main Oct 11, 2025
7 of 9 checks passed
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
…2921)

The change fixes a bug in the SROA where tree-structured merge
optimization was incorrectly applied when the size of the stored value
was not a multiple of the new allocated element type size. The original
change is llvm#152793. A simple
repro would be

```
define <1 x i32> @foo(<1 x i16> %a, <1 x i16> %b) {
entry:
  %alloca = alloca [1 x i32]

  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
  store <1 x i16> %a, ptr %ptr0

  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
  store <1 x i16> %b, ptr %ptr1

  %result = load <1 x i32>, ptr %alloca
  ret <1 x i32> %result
}
```

Currently, this will lead to a compile time crash.

In this change, we will skip the tree-structured merge for this case and
fall back to normal SROA.
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
…2921)

The change fixes a bug in the SROA where tree-structured merge
optimization was incorrectly applied when the size of the stored value
was not a multiple of the new allocated element type size. The original
change is llvm#152793. A simple
repro would be

```
define <1 x i32> @foo(<1 x i16> %a, <1 x i16> %b) {
entry:
  %alloca = alloca [1 x i32]

  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
  store <1 x i16> %a, ptr %ptr0

  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
  store <1 x i16> %b, ptr %ptr1

  %result = load <1 x i32>, ptr %alloca
  ret <1 x i32> %result
}
```

Currently, this will lead to a compile time crash.

In this change, we will skip the tree-structured merge for this case and
fall back to normal SROA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants