Skip to content

Conversation

@juliannagele
Copy link
Member

@juliannagele juliannagele commented Oct 22, 2025

This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with #164682, which can then potentially further scalarize, provided there is no freeze.

alive2 proof: https://alive2.llvm.org/ce/z/W-GD88

@juliannagele juliannagele changed the title [VecorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. [VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. Oct 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Julian Nagele (juliannagele)

Changes

This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with #164682, which can then potentially further scalarize, provided there is no freeze.


Full diff: https://github.com/llvm/llvm-project/pull/164683.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+18-2)
  • (modified) llvm/test/Transforms/VectorCombine/AArch64/ext-extract.ll (+34)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index d6eb00da11dc8..4a0c98e03203d 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -2017,8 +2017,24 @@ bool VectorCombine::scalarizeExtExtract(Instruction &I) {
 
   Value *ScalarV = Ext->getOperand(0);
   if (!isGuaranteedNotToBePoison(ScalarV, &AC, dyn_cast<Instruction>(ScalarV),
-                                 &DT))
-    ScalarV = Builder.CreateFreeze(ScalarV);
+                                 &DT)) {
+    // Check if all lanes are extracted and all extracts trigger UB on poison.
+    // If so, we do not need to insert a freeze.
+    SmallDenseSet<uint64_t, 8> ExtractedLanes;
+    bool AllExtractsHaveUB = true;
+    for (User *U : Ext->users()) {
+      auto *Extract = cast<ExtractElementInst>(U);
+      uint64_t Idx =
+          cast<ConstantInt>(Extract->getIndexOperand())->getZExtValue();
+      ExtractedLanes.insert(Idx);
+      if (!programUndefinedIfPoison(Extract)) {
+        AllExtractsHaveUB = false;
+        break;
+      }
+    }
+    if (!AllExtractsHaveUB || ExtractedLanes.size() != SrcTy->getNumElements())
+      ScalarV = Builder.CreateFreeze(ScalarV);
+  }
   ScalarV = Builder.CreateBitCast(
       ScalarV,
       IntegerType::get(SrcTy->getContext(), DL->getTypeSizeInBits(SrcTy)));
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/ext-extract.ll b/llvm/test/Transforms/VectorCombine/AArch64/ext-extract.ll
index 60700412686ea..29d4ddd3d0ac8 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/ext-extract.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/ext-extract.ll
@@ -346,3 +346,37 @@ entry:
   call void @use.i32(i32 %ext.3)
   ret void
 }
+
+define noundef i32 @zext_v4i8_all_lanes_used_no_freeze(<4 x i8> %src) {
+; CHECK-LABEL: define noundef i32 @zext_v4i8_all_lanes_used_no_freeze(
+; CHECK-SAME: <4 x i8> [[SRC:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i8> [[SRC]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[TMP0]], 24
+; CHECK-NEXT:    [[TMP2:%.*]] = lshr i32 [[TMP0]], 16
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], 255
+; CHECK-NEXT:    [[TMP4:%.*]] = lshr i32 [[TMP0]], 8
+; CHECK-NEXT:    [[TMP5:%.*]] = and i32 [[TMP4]], 255
+; CHECK-NEXT:    [[TMP6:%.*]] = and i32 [[TMP0]], 255
+; CHECK-NEXT:    [[EXT:%.*]] = zext nneg <4 x i8> [[SRC]] to <4 x i32>
+; CHECK-NEXT:    [[EXT_0:%.*]] = extractelement <4 x i32> [[EXT]], i64 0
+; CHECK-NEXT:    [[EXT_1:%.*]] = extractelement <4 x i32> [[EXT]], i64 1
+; CHECK-NEXT:    [[EXT_2:%.*]] = extractelement <4 x i32> [[EXT]], i64 2
+; CHECK-NEXT:    [[EXT_3:%.*]] = extractelement <4 x i32> [[EXT]], i64 3
+; CHECK-NEXT:    [[ADD1:%.*]] = add i32 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i32 [[ADD1]], [[TMP3]]
+; CHECK-NEXT:    [[ADD3:%.*]] = add i32 [[ADD2]], [[TMP1]]
+; CHECK-NEXT:    ret i32 [[ADD3]]
+;
+entry:
+  %ext = zext nneg <4 x i8> %src to <4 x i32>
+  %ext.0 = extractelement <4 x i32> %ext, i64 0
+  %ext.1 = extractelement <4 x i32> %ext, i64 1
+  %ext.2 = extractelement <4 x i32> %ext, i64 2
+  %ext.3 = extractelement <4 x i32> %ext, i64 3
+
+  %add1 = add i32 %ext.0, %ext.1
+  %add2 = add i32 %add1, %ext.2
+  %add3 = add i32 %add2, %ext.3
+  ret i32 %add3
+}

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an alive2 proof to the description?

Comment on lines 2021 to 2034
// Check if all lanes are extracted and all extracts trigger UB on poison.
// If so, we do not need to insert a freeze.
SmallDenseSet<uint64_t, 8> ExtractedLanes;
bool AllExtractsHaveUB = true;
for (User *U : Ext->users()) {
auto *Extract = cast<ExtractElementInst>(U);
uint64_t Idx =
cast<ConstantInt>(Extract->getIndexOperand())->getZExtValue();
ExtractedLanes.insert(Idx);
if (!programUndefinedIfPoison(Extract)) {
AllExtractsHaveUB = false;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth to see if this logic can be moved to ValueTracking?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably make sense to first add an initial implementation here, and then check if there are other cases where this would help via ValueTracking as follow-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've addressed the comments in VectorCombine for now. I think I have a proposal on how to move the logic to ValueTracking, will post it as a follow-up.

uint64_t Idx =
cast<ConstantInt>(Extract->getIndexOperand())->getZExtValue();
ExtractedLanes.insert(Idx);
if (!programUndefinedIfPoison(Extract)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under the assumption that the extract will be executed. The extract might be conditionally executed in a different block. I think the rest of the code does not exclude that possibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least initially, it should probably be enough to restrict this to cases where the extracts are in the same block as the extend, and they are guaranteed to execute, if the extend executes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks! I added checks that the extracts are in the same block as the extend and that they are executed, and tests showing the freeze will still be inserted if those checks fail.

…extract if all extracts would lead to UB on poison.
…extend-extract if all extracts would lead to UB on poison.
…rizing extend-extract if all extracts would lead to UB on poison.
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Comment on lines 2023 to 2024
// are guaranteed to execute if Ext executes.
// If so, we do not need to insert a freeze.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// are guaranteed to execute if Ext executes.
// If so, we do not need to insert a freeze.
// are guaranteed to execute if Ext executes. If so, we do not need to insert a freeze.

…n scalarizing extend-extract if all extracts would lead to UB on poison.
@juliannagele juliannagele merged commit 28a20b4 into llvm:main Nov 4, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 4, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/32398

Here is the relevant piece of the build log for the reference
Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
...
PASS: MLIR :: Pass/pipeline-parsing.mlir (3622 of 3633)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/12/22 (3623 of 3633)
PASS: MLIR :: mlir-tblgen/op-interface.td (3624 of 3633)
PASS: MLIR :: mlir-translate/import-diagnostics.ll (3625 of 3633)
PASS: MLIR :: mlir-tblgen/llvm-intrinsics.td (3626 of 3633)
PASS: MLIR :: mlir-reduce/dce-test.mlir (3627 of 3633)
PASS: MLIR :: mlir-runner/simple.mlir (3628 of 3633)
PASS: MLIR :: Pass/pipeline-options-parsing.mlir (3629 of 3633)
PASS: MLIR :: mlir-tblgen/rewriter-errors.td (3630 of 3633)
PASS: MLIR :: mlir-tblgen/op-error.td (3631 of 3633)
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1987.543915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants