[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg #126408

ParkHanbum · 2025-02-09T08:24:51Z

insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

In previous, the above transform was only possible if the Extract/Insert
Index was the same; this patch makes the above transform possible
even if the two indexes are different.

Proof: https://alive2.llvm.org/ce/z/aDfdyG
Fixes: #125675

llvmbot · 2025-02-09T08:25:24Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: hanbeom (ParkHanbum)

Changes

insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

In previous patches, the above transform was only possible
if the Extract/Insert Index was the same; this patch makes
the above transform possible even if the two indexes are different.

Proof: https://alive2.llvm.org/ce/z/aDfdyG
Fixes: #125675

Full diff: https://github.com/llvm/llvm-project/pull/126408.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+26-25)
(modified) llvm/test/Transforms/PhaseOrdering/X86/addsub-inseltpoison.ll (+2-4)
(modified) llvm/test/Transforms/PhaseOrdering/X86/addsub.ll (+2-4)
(modified) llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll (+41-34)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 746742e14d080e6..969e569a6f84959 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -663,11 +663,11 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
 /// shuffle.
 bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   // Match an insert (op (extract)) pattern.
-  Value *DestVec;
-  uint64_t Index;
+  Value *DstVec;
+  uint64_t ExtIdx, InsIdx;
   Instruction *FNeg;
-  if (!match(&I, m_InsertElt(m_Value(DestVec), m_OneUse(m_Instruction(FNeg)),
-                             m_ConstantInt(Index))))
+  if (!match(&I, m_InsertElt(m_Value(DstVec), m_OneUse(m_Instruction(FNeg)),
+                             m_ConstantInt(InsIdx))))
     return false;
 
   // Note: This handles the canonical fneg instruction and "fsub -0.0, X".
@@ -675,48 +675,49 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   Instruction *Extract;
   if (!match(FNeg, m_FNeg(m_CombineAnd(
                        m_Instruction(Extract),
-                       m_ExtractElt(m_Value(SrcVec), m_SpecificInt(Index))))))
+                       m_ExtractElt(m_Value(SrcVec), m_ConstantInt(ExtIdx))))))
     return false;
 
-  auto *VecTy = cast<FixedVectorType>(I.getType());
-  auto *ScalarTy = VecTy->getScalarType();
+  auto *DstVecTy = cast<FixedVectorType>(DstVec->getType());
+  auto *DstVecScalarTy = DstVecTy->getScalarType();
   auto *SrcVecTy = dyn_cast<FixedVectorType>(SrcVec->getType());
-  if (!SrcVecTy || ScalarTy != SrcVecTy->getScalarType())
+  if (!SrcVecTy || DstVecScalarTy != SrcVecTy->getScalarType())
     return false;
 
   // Ignore bogus insert/extract index.
-  unsigned NumElts = VecTy->getNumElements();
-  if (Index >= NumElts)
+  unsigned NumDstElts = DstVecTy->getNumElements();
+  unsigned NumSrcElts = SrcVecTy->getNumElements();
+  if (InsIdx >= NumDstElts || ExtIdx >= NumSrcElts || NumDstElts == 1)
     return false;
 
   // We are inserting the negated element into the same lane that we extracted
   // from. This is equivalent to a select-shuffle that chooses all but the
   // negated element from the destination vector.
-  SmallVector<int> Mask(NumElts);
+  SmallVector<int> Mask(NumDstElts);
   std::iota(Mask.begin(), Mask.end(), 0);
-  Mask[Index] = Index + NumElts;
+  Mask[InsIdx] = (ExtIdx % NumDstElts) + NumDstElts;
   InstructionCost OldCost =
-      TTI.getArithmeticInstrCost(Instruction::FNeg, ScalarTy, CostKind) +
-      TTI.getVectorInstrCost(I, VecTy, CostKind, Index);
+      TTI.getArithmeticInstrCost(Instruction::FNeg, DstVecScalarTy, CostKind) +
+      TTI.getVectorInstrCost(I, DstVecTy, CostKind, InsIdx);
 
   // If the extract has one use, it will be eliminated, so count it in the
   // original cost. If it has more than one use, ignore the cost because it will
   // be the same before/after.
   if (Extract->hasOneUse())
-    OldCost += TTI.getVectorInstrCost(*Extract, VecTy, CostKind, Index);
+    OldCost += TTI.getVectorInstrCost(*Extract, SrcVecTy, CostKind, ExtIdx);
 
   InstructionCost NewCost =
-      TTI.getArithmeticInstrCost(Instruction::FNeg, VecTy, CostKind) +
-      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, VecTy, Mask,
+      TTI.getArithmeticInstrCost(Instruction::FNeg, SrcVecTy, CostKind) +
+      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, DstVecTy, Mask,
                          CostKind);
 
-  bool NeedLenChg = SrcVecTy->getNumElements() != NumElts;
+  bool NeedLenChg = SrcVecTy->getNumElements() != NumDstElts;
   // If the lengths of the two vectors are not equal,
   // we need to add a length-change vector. Add this cost.
   SmallVector<int> SrcMask;
   if (NeedLenChg) {
-    SrcMask.assign(NumElts, PoisonMaskElem);
-    SrcMask[Index] = Index;
+    SrcMask.assign(NumDstElts, PoisonMaskElem);
+    SrcMask[(ExtIdx % NumDstElts)] = ExtIdx;
     NewCost += TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
                                   SrcVecTy, SrcMask, CostKind);
   }
@@ -725,15 +726,15 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
     return false;
 
   Value *NewShuf;
-  // insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
+  // insertelt DstVec, (fneg (extractelt SrcVec, Index)), Index
   Value *VecFNeg = Builder.CreateFNegFMF(SrcVec, FNeg);
   if (NeedLenChg) {
-    // shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask
+    // shuffle DstVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask
     Value *LenChgShuf = Builder.CreateShuffleVector(VecFNeg, SrcMask);
-    NewShuf = Builder.CreateShuffleVector(DestVec, LenChgShuf, Mask);
+    NewShuf = Builder.CreateShuffleVector(DstVec, LenChgShuf, Mask);
   } else {
-    // shuffle DestVec, (fneg SrcVec), Mask
-    NewShuf = Builder.CreateShuffleVector(DestVec, VecFNeg, Mask);
+    // shuffle DstVec, (fneg SrcVec), Mask
+    NewShuf = Builder.CreateShuffleVector(DstVec, VecFNeg, Mask);
   }
 
   replaceValue(I, *NewShuf);
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/addsub-inseltpoison.ll b/llvm/test/Transforms/PhaseOrdering/X86/addsub-inseltpoison.ll
index a3af048c4e442f9..1603ee1a6a301de 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/addsub-inseltpoison.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/addsub-inseltpoison.ll
@@ -104,11 +104,9 @@ define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %
 ; PR58139
 define <2 x double> @_mm_complexmult_pd_naive(<2 x double> %a, <2 x double> %b) {
 ; SSE-LABEL: @_mm_complexmult_pd_naive(
-; SSE-NEXT:    [[B1:%.*]] = extractelement <2 x double> [[B:%.*]], i64 1
-; SSE-NEXT:    [[TMP1:%.*]] = fneg double [[B1]]
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <2 x double> [[A:%.*]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
-; SSE-NEXT:    [[TMP3:%.*]] = shufflevector <2 x double> [[B]], <2 x double> poison, <2 x i32> <i32 poison, i32 0>
-; SSE-NEXT:    [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[TMP1]], i64 0
+; SSE-NEXT:    [[TMP3:%.*]] = fneg <2 x double> [[B:%.*]]
+; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[B]], <2 x i32> <i32 1, i32 2>
 ; SSE-NEXT:    [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
 ; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <2 x double> [[A]], <2 x double> poison, <2 x i32> zeroinitializer
 ; SSE-NEXT:    [[TMP7:%.*]] = tail call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP6]], <2 x double> [[B]], <2 x double> [[TMP5]])
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll b/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
index 40dc2aaeced57a3..e228d4dae202df3 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
@@ -104,11 +104,9 @@ define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %
 ; PR58139
 define <2 x double> @_mm_complexmult_pd_naive(<2 x double> %a, <2 x double> %b) {
 ; SSE-LABEL: @_mm_complexmult_pd_naive(
-; SSE-NEXT:    [[B1:%.*]] = extractelement <2 x double> [[B:%.*]], i64 1
-; SSE-NEXT:    [[TMP1:%.*]] = fneg double [[B1]]
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <2 x double> [[A:%.*]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
-; SSE-NEXT:    [[TMP3:%.*]] = shufflevector <2 x double> [[B]], <2 x double> poison, <2 x i32> <i32 poison, i32 0>
-; SSE-NEXT:    [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[TMP1]], i64 0
+; SSE-NEXT:    [[TMP3:%.*]] = fneg <2 x double> [[B:%.*]]
+; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[B]], <2 x i32> <i32 1, i32 2>
 ; SSE-NEXT:    [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
 ; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <2 x double> [[A]], <2 x double> poison, <2 x i32> zeroinitializer
 ; SSE-NEXT:    [[TMP7:%.*]] = tail call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP6]], <2 x double> [[B]], <2 x double> [[TMP5]])
diff --git a/llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll b/llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll
index cd2bc757eb9d27d..a77a66549704203 100644
--- a/llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll
@@ -47,9 +47,9 @@ define <4 x float> @ext2_v4f32(<4 x float> %x, <4 x float> %y) {
 
 define <4 x float> @ext2_v2f32v4f32(<2 x float> %x, <4 x float> %y) {
 ; CHECK-LABEL: @ext2_v2f32v4f32(
-; CHECK-NEXT:    [[TMP1:%.*]] = fneg <2 x float> [[X:%.*]]
-; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 2, i32 poison>
-; CHECK-NEXT:    [[R:%.*]] = shufflevector <4 x float> [[Y:%.*]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
+; CHECK-NEXT:    [[E:%.*]] = extractelement <2 x float> [[X:%.*]], i32 2
+; CHECK-NEXT:    [[N:%.*]] = fneg float [[E]]
+; CHECK-NEXT:    [[R:%.*]] = insertelement <4 x float> [[Y:%.*]], float [[N]], i32 2
 ; CHECK-NEXT:    ret <4 x float> [[R]]
 ;
   %e = extractelement <2 x float> %x, i32 2
@@ -73,17 +73,11 @@ define <2 x double> @ext1_v2f64(<2 x double> %x, <2 x double> %y) {
 }
 
 define <4 x double> @ext1_v2f64v4f64(<2 x double> %x, <4 x double> %y) {
-; SSE-LABEL: @ext1_v2f64v4f64(
-; SSE-NEXT:    [[E:%.*]] = extractelement <2 x double> [[X:%.*]], i32 1
-; SSE-NEXT:    [[N:%.*]] = fneg nsz double [[E]]
-; SSE-NEXT:    [[R:%.*]] = insertelement <4 x double> [[Y:%.*]], double [[N]], i32 1
-; SSE-NEXT:    ret <4 x double> [[R]]
-;
-; AVX-LABEL: @ext1_v2f64v4f64(
-; AVX-NEXT:    [[TMP1:%.*]] = fneg nsz <2 x double> [[X:%.*]]
-; AVX-NEXT:    [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <4 x i32> <i32 poison, i32 1, i32 poison, i32 poison>
-; AVX-NEXT:    [[R:%.*]] = shufflevector <4 x double> [[Y:%.*]], <4 x double> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
-; AVX-NEXT:    ret <4 x double> [[R]]
+; CHECK-LABEL: @ext1_v2f64v4f64(
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg nsz <2 x double> [[X:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <4 x i32> <i32 poison, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT:    [[R:%.*]] = shufflevector <4 x double> [[Y:%.*]], <4 x double> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
+; CHECK-NEXT:    ret <4 x double> [[R]]
 ;
   %e = extractelement <2 x double> %x, i32 1
   %n = fneg nsz double %e
@@ -105,9 +99,9 @@ define <8 x float> @ext7_v8f32(<8 x float> %x, <8 x float> %y) {
 
 define <8 x float> @ext7_v4f32v8f32(<4 x float> %x, <8 x float> %y) {
 ; CHECK-LABEL: @ext7_v4f32v8f32(
-; CHECK-NEXT:    [[E:%.*]] = extractelement <4 x float> [[X:%.*]], i32 3
-; CHECK-NEXT:    [[N:%.*]] = fneg float [[E]]
-; CHECK-NEXT:    [[R:%.*]] = insertelement <8 x float> [[Y:%.*]], float [[N]], i32 7
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg <4 x float> [[X:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <8 x i32> <i32 poison, i32 poison, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[R:%.*]] = shufflevector <8 x float> [[Y:%.*]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 11>
 ; CHECK-NEXT:    ret <8 x float> [[R]]
 ;
   %e = extractelement <4 x float> %x, i32 3
@@ -141,12 +135,20 @@ define <8 x float> @ext7_v8f32_use1(<8 x float> %x, <8 x float> %y) {
 }
 
 define <8 x float> @ext7_v4f32v8f32_use1(<4 x float> %x, <8 x float> %y) {
-; CHECK-LABEL: @ext7_v4f32v8f32_use1(
-; CHECK-NEXT:    [[E:%.*]] = extractelement <4 x float> [[X:%.*]], i32 3
-; CHECK-NEXT:    call void @use(float [[E]])
-; CHECK-NEXT:    [[N:%.*]] = fneg float [[E]]
-; CHECK-NEXT:    [[R:%.*]] = insertelement <8 x float> [[Y:%.*]], float [[N]], i32 3
-; CHECK-NEXT:    ret <8 x float> [[R]]
+; SSE-LABEL: @ext7_v4f32v8f32_use1(
+; SSE-NEXT:    [[E:%.*]] = extractelement <4 x float> [[X:%.*]], i32 3
+; SSE-NEXT:    call void @use(float [[E]])
+; SSE-NEXT:    [[TMP1:%.*]] = fneg <4 x float> [[X]]
+; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <8 x i32> <i32 poison, i32 poison, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT:    [[R:%.*]] = shufflevector <8 x float> [[Y:%.*]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 4, i32 5, i32 6, i32 7>
+; SSE-NEXT:    ret <8 x float> [[R]]
+;
+; AVX-LABEL: @ext7_v4f32v8f32_use1(
+; AVX-NEXT:    [[E:%.*]] = extractelement <4 x float> [[X:%.*]], i32 3
+; AVX-NEXT:    call void @use(float [[E]])
+; AVX-NEXT:    [[N:%.*]] = fneg float [[E]]
+; AVX-NEXT:    [[R:%.*]] = insertelement <8 x float> [[Y:%.*]], float [[N]], i32 3
+; AVX-NEXT:    ret <8 x float> [[R]]
 ;
   %e = extractelement <4 x float> %x, i32 3
   call void @use(float %e)
@@ -220,9 +222,8 @@ define <4 x double> @ext_index_var_v2f64v4f64(<2 x double> %x, <4 x double> %y,
 
 define <2 x double> @ext1_v2f64_ins0(<2 x double> %x, <2 x double> %y) {
 ; CHECK-LABEL: @ext1_v2f64_ins0(
-; CHECK-NEXT:    [[E:%.*]] = extractelement <2 x double> [[X:%.*]], i32 1
-; CHECK-NEXT:    [[N:%.*]] = fneg nsz double [[E]]
-; CHECK-NEXT:    [[R:%.*]] = insertelement <2 x double> [[Y:%.*]], double [[N]], i32 0
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg nsz <2 x double> [[X:%.*]]
+; CHECK-NEXT:    [[R:%.*]] = shufflevector <2 x double> [[Y:%.*]], <2 x double> [[TMP1]], <2 x i32> <i32 3, i32 1>
 ; CHECK-NEXT:    ret <2 x double> [[R]]
 ;
   %e = extractelement <2 x double> %x, i32 1
@@ -234,9 +235,9 @@ define <2 x double> @ext1_v2f64_ins0(<2 x double> %x, <2 x double> %y) {
 ; Negative test - extract from an index greater than the vector width of the destination
 define <2 x double> @ext3_v4f64v2f64(<4 x double> %x, <2 x double> %y) {
 ; CHECK-LABEL: @ext3_v4f64v2f64(
-; CHECK-NEXT:    [[E:%.*]] = extractelement <4 x double> [[X:%.*]], i32 3
-; CHECK-NEXT:    [[N:%.*]] = fneg nsz double [[E]]
-; CHECK-NEXT:    [[R:%.*]] = insertelement <2 x double> [[Y:%.*]], double [[N]], i32 1
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg nsz <4 x double> [[X:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x double> [[TMP1]], <4 x double> poison, <2 x i32> <i32 poison, i32 3>
+; CHECK-NEXT:    [[R:%.*]] = shufflevector <2 x double> [[Y:%.*]], <2 x double> [[TMP2]], <2 x i32> <i32 0, i32 3>
 ; CHECK-NEXT:    ret <2 x double> [[R]]
 ;
   %e = extractelement <4 x double> %x, i32 3
@@ -246,11 +247,17 @@ define <2 x double> @ext3_v4f64v2f64(<4 x double> %x, <2 x double> %y) {
 }
 
 define <4 x double> @ext1_v2f64v4f64_ins0(<2 x double> %x, <4 x double> %y) {
-; CHECK-LABEL: @ext1_v2f64v4f64_ins0(
-; CHECK-NEXT:    [[E:%.*]] = extractelement <2 x double> [[X:%.*]], i32 1
-; CHECK-NEXT:    [[N:%.*]] = fneg nsz double [[E]]
-; CHECK-NEXT:    [[R:%.*]] = insertelement <4 x double> [[Y:%.*]], double [[N]], i32 0
-; CHECK-NEXT:    ret <4 x double> [[R]]
+; SSE-LABEL: @ext1_v2f64v4f64_ins0(
+; SSE-NEXT:    [[TMP1:%.*]] = fneg nsz <2 x double> [[X:%.*]]
+; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <4 x i32> <i32 poison, i32 1, i32 poison, i32 poison>
+; SSE-NEXT:    [[R:%.*]] = shufflevector <4 x double> [[Y:%.*]], <4 x double> [[TMP2]], <4 x i32> <i32 5, i32 1, i32 2, i32 3>
+; SSE-NEXT:    ret <4 x double> [[R]]
+;
+; AVX-LABEL: @ext1_v2f64v4f64_ins0(
+; AVX-NEXT:    [[E:%.*]] = extractelement <2 x double> [[X:%.*]], i32 1
+; AVX-NEXT:    [[N:%.*]] = fneg nsz double [[E]]
+; AVX-NEXT:    [[R:%.*]] = insertelement <4 x double> [[Y:%.*]], double [[N]], i32 0
+; AVX-NEXT:    ret <4 x double> [[R]]
 ;
   %e = extractelement <2 x double> %x, i32 1
   %n = fneg nsz double %e

ParkHanbum · 2025-02-13T18:07:33Z

@RKSimon I need a review. Please help

RKSimon · 2025-02-27T09:36:54Z

@RKSimon I need a review. Please help

@ParkHanbum Sorry I missed this - maybe time for you to join the LLVM project so you can add reviewers?

RKSimon · 2025-02-27T09:53:21Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp


  // We are inserting the negated element into the same lane that we extracted
  // from. This is equivalent to a select-shuffle that chooses all but the
  // negated element from the destination vector.


Update the comment

RKSimon · 2025-02-27T09:55:39Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

-    SrcMask.assign(NumElts, PoisonMaskElem);
-    SrcMask[Index] = Index;
+    SrcMask.assign(NumDstElts, PoisonMaskElem);
+    SrcMask[(ExtIdx % NumDstElts)] = ExtIdx;


remove unnecessary ()

RKSimon · 2025-02-27T09:57:11Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

  }

+  if (LenChgShuf)
+    Worklist.pushValue(LenChgShuf);


llvm/lib/Transforms/Vectorize/VectorCombine.cpp

ParkHanbum · 2025-02-27T18:51:46Z

@RKSimon Yes I would be very grateful if you could do that!

…sExtFNeg insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask In previous patches, the above transform was only possible if the Extract/Insert Index was the same; this patch makes the above transform possible even if the two indexes are different. Proof: https://alive2.llvm.org/ce/z/aDfdyG Fixes: llvm#125675

github-actions · 2025-11-05T02:46:41Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ParkHanbum · 2025-11-05T04:52:32Z

@RKSimon Would you be able to review this PR when you have time?

RKSimon · 2025-11-05T15:52:45Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  // one element
+  unsigned NumDstElts = DstVecTy->getNumElements();
+  unsigned NumSrcElts = SrcVecTy->getNumElements();
+  if (ExtIdx > NumSrcElts || InsIdx >= NumDstElts || NumDstElts == 1)


ExtIdx >= NumSrcElts ?

When the Vector Element is 1, the extractable element is 0. so, equal may exceed the bounds of SrcVector.

RKSimon · 2025-11-05T15:55:44Z

llvm/test/Transforms/VectorCombine/X86/extract-fneg-insert.ll

  %r = insertelement <2 x double> %y, double %n, i32 1
  ret <2 x double> %r
 }



do we have a test case where the dst vector is larger than the src vector?

I'll update it ASAP

RKSimon

LGTM

…sExtFNeg (llvm#126408) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask In previous, the above transform was only possible if the Extract/Insert Index was the same; this patch makes the above transform possible even if the two indexes are different. Proof: https://alive2.llvm.org/ce/z/aDfdyG Fixes: llvm#125675

llvmbot added vectorizers llvm:transforms labels Feb 9, 2025

ParkHanbum force-pushed the i125675 branch from 7ea2d27 to f27edf3 Compare February 26, 2025 20:02

RKSimon requested review from RKSimon and davemgreen February 26, 2025 20:37

RKSimon reviewed Feb 27, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Show resolved Hide resolved

ParkHanbum requested a review from RKSimon February 27, 2025 20:42

llvmbot added the llvm:vectorcombine label Nov 4, 2025

ParkHanbum added 8 commits November 5, 2025 10:20

add debug message

ea149f5

add new instructions to worklist

05d486f

update comment

b15fc1f

remove unnecessary parentheses

1a7640c

Fix miss & misused Worklist

c315e4e

fix wrong boundary check

6e08857

Apply the modified shufflecost parameter

87f432c

ParkHanbum force-pushed the i125675 branch from da1978a to 87f432c Compare November 5, 2025 02:44

formatting

61bc3e0

ParkHanbum mentioned this pull request Nov 5, 2025

[x86][reg][performance] addsubpd not generated in complex multiplication since LLVM 13 #58139

Open

RKSimon requested changes Nov 5, 2025

View reviewed changes

Add tests for vectors that NumElement of Dst is bigger than Src

4ca25c8

ParkHanbum force-pushed the i125675 branch from 4f859bd to 4ca25c8 Compare November 6, 2025 05:53

ParkHanbum requested a review from RKSimon November 6, 2025 06:38

RKSimon approved these changes Nov 7, 2025

View reviewed changes

Merge branch 'main' into i125675

7ab8bd8

RKSimon enabled auto-merge (squash) November 7, 2025 18:06

RKSimon merged commit 50ba89a into llvm:main Nov 7, 2025
7 of 9 checks passed

nigham mentioned this pull request Nov 10, 2025

[libc] Implement fchown #167286

Merged

[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg #126408

[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg #126408

Uh oh!

Conversation

ParkHanbum commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParkHanbum commented Feb 13, 2025

Uh oh!

RKSimon commented Feb 27, 2025

Uh oh!

RKSimon Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ParkHanbum commented Feb 27, 2025

Uh oh!

github-actions bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParkHanbum commented Nov 5, 2025

Uh oh!

RKSimon Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

ParkHanbum Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

ParkHanbum Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ParkHanbum commented Feb 9, 2025 •

edited

Loading

llvmbot commented Feb 9, 2025 •

edited

Loading

github-actions bot commented Nov 5, 2025 •

edited

Loading

ParkHanbum Nov 5, 2025 •

edited

Loading