[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

mshockwave · 2025-08-21T21:36:37Z

To a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).

Stacked on top of #154450 (mostly reusing its tests)

We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the wild.

llvmbot · 2025-08-21T21:37:11Z

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

To a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).

Stacked on top of #154450 (mostly reusing its tests)

We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the while.

Patch is 21.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154847.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+98-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+123)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+181)
(modified) llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll (+3-4)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4a1db80076530..12a9c57ac15ae 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4512,33 +4512,88 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
          "Illegal type which will result in reserved encoding");
 
   const unsigned Policy = RISCVVType::TAIL_AGNOSTIC | RISCVVType::MASK_AGNOSTIC;
+  auto getVSlide = [&](bool SlideUp, EVT ContainerVT, SDValue Passthru,
+                       SDValue Vec, SDValue Offset, SDValue Mask,
+                       SDValue VL) -> SDValue {
+    if (SlideUp)
+      return getVSlideup(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+    return getVSlidedown(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+  };
+
+  // General case: splat the first operand and slide other operands down one
+  // by one to form a vector. Alternatively, if the last operand is an
+  // extraction from element 0 of a vector, we can use that vector as the start
+  // value and slide up instead of slide down. Such that we can avoid the splat.
+  SmallVector<SDValue> Operands(Op->op_begin(), Op->op_end());
+  SDValue EVec;
+  bool SlideUp = false;
+  // Find the first first non-undef from the tail.
+  auto ItLastNonUndef = find_if(Operands.rbegin(), Operands.rend(),
+                                [](SDValue V) { return !V.isUndef(); });
+  if (ItLastNonUndef != Operands.rend()) {
+    using namespace SDPatternMatch;
+    // Check if the last non-undef operand was an extraction.
+    SlideUp = sd_match(*ItLastNonUndef, m_ExtractElt(m_Value(EVec), m_Zero()));
+  }
+
+  if (SlideUp) {
+    MVT EVecContainerVT = EVec.getSimpleValueType();
+    // Make sure the original vector has scalable vector type.
+    if (EVecContainerVT.isFixedLengthVector()) {
+      EVecContainerVT =
+          getContainerForFixedLengthVector(DAG, EVecContainerVT, Subtarget);
+      EVec = convertToScalableVector(EVecContainerVT, EVec, DAG, Subtarget);
+    }
+
+    // Adapt EVec's type into ContainerVT.
+    if (EVecContainerVT.getVectorMinNumElements() <
+        ContainerVT.getVectorMinNumElements())
+      EVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(ContainerVT), EVec, 0);
+    else
+      EVec = DAG.getExtractSubvector(DL, ContainerVT, EVec, 0);
+
+    // Reverse the elements as we're going to slide up from the last element.
+    std::reverse(Operands.begin(), Operands.end());
+  }
 
   SDValue Vec;
   UndefCount = 0;
-  for (SDValue V : Op->ops()) {
+  for (SDValue V : Operands) {
     if (V.isUndef()) {
       UndefCount++;
       continue;
     }
 
-    // Start our sequence with a TA splat in the hopes that hardware is able to
-    // recognize there's no dependency on the prior value of our temporary
-    // register.
+    // Start our sequence with either a TA splat or extract source in the
+    // hopes that hardware is able to recognize there's no dependency on the
+    // prior value of our temporary register.
     if (!Vec) {
-      Vec = DAG.getSplatVector(VT, DL, V);
-      Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      if (SlideUp) {
+        Vec = EVec;
+      } else {
+        Vec = DAG.getSplatVector(VT, DL, V);
+        Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      }
+
       UndefCount = 0;
       continue;
     }
 
     if (UndefCount) {
       const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-      Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                          Vec, Offset, Mask, VL, Policy);
+      Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                      Offset, Mask, VL);
       UndefCount = 0;
     }
-    auto OpCode =
-      VT.isFloatingPoint() ? RISCVISD::VFSLIDE1DOWN_VL : RISCVISD::VSLIDE1DOWN_VL;
+
+    unsigned OpCode;
+    if (VT.isFloatingPoint())
+      OpCode = SlideUp ? RISCVISD::VFSLIDE1UP_VL : RISCVISD::VFSLIDE1DOWN_VL;
+    else
+      OpCode = SlideUp ? RISCVISD::VSLIDE1UP_VL : RISCVISD::VSLIDE1DOWN_VL;
+
     if (!VT.isFloatingPoint())
       V = DAG.getNode(ISD::ANY_EXTEND, DL, Subtarget.getXLenVT(), V);
     Vec = DAG.getNode(OpCode, DL, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
@@ -4546,8 +4601,8 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
   }
   if (UndefCount) {
     const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-    Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                        Vec, Offset, Mask, VL, Policy);
+    Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                    Offset, Mask, VL);
   }
   return convertFromScalableVector(VT, Vec, DAG, Subtarget);
 }
@@ -21054,6 +21109,37 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
       return N->getOperand(0);
     break;
   }
+  case RISCVISD::VSLIDE1UP_VL:
+  case RISCVISD::VFSLIDE1UP_VL: {
+    using namespace SDPatternMatch;
+    SDValue SrcVec;
+    SDLoc DL(N);
+    MVT VT = N->getSimpleValueType(0);
+    // If the scalar we're sliding in was extracted from the first element of a
+    // vector, we can use that vector as the passthru in a normal slideup of 1.
+    // This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
+    if (N->getOperand(0).isUndef() &&
+        sd_match(
+            N->getOperand(2),
+            m_OneUse(m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()),
+                             m_Node(RISCVISD::VMV_X_S, m_Value(SrcVec)))))) {
+      MVT SrcVecVT = SrcVec.getSimpleValueType();
+      // Adapt the value type of source vector.
+      if (SrcVecVT.isFixedLengthVector()) {
+        SrcVecVT = getContainerForFixedLengthVector(SrcVecVT);
+        SrcVec = convertToScalableVector(SrcVecVT, SrcVec, DAG, Subtarget);
+      }
+      if (SrcVecVT.getVectorMinNumElements() < VT.getVectorMinNumElements())
+        SrcVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(VT), SrcVec, 0);
+      else
+        SrcVec = DAG.getExtractSubvector(DL, VT, SrcVec, 0);
+
+      return getVSlideup(DAG, Subtarget, DL, VT, SrcVec, N->getOperand(1),
+                         DAG.getConstant(1, DL, XLenVT), N->getOperand(3),
+                         N->getOperand(4));
+    }
+    break;
+  }
   }
 
   return SDValue();
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 3c3e08d387faa..b62d0607d048c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1828,3 +1828,126 @@ define <8 x double> @buildvec_v8f64_zvl512(double %e0, double %e1, double %e2, d
   %v7 = insertelement <8 x double> %v6, double %e7, i64 7
   ret <8 x double> %v7
 }
+
+define <8 x double> @buildvec_slideup(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e6) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa6
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa5
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT:    vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT:    vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+  %e7 = extractelement <4 x double> %v, i64 0
+  %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+  ret <8 x double> %v7
+}
+
+define <8 x double> @buildvec_slideup_trailing_undef(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_trailing_undef:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT:    vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT:    vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %e5 = extractelement <4 x double> %v, i64 0
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %v6 = insertelement <8 x double> %v5, double poison, i64 6
+  %v7 = insertelement <8 x double> %v6, double poison, i64 7
+  ret <8 x double> %v7
+}
+
+; Negative test for slideup lowering where the extract_element was not build_vector's last operand.
+define <8 x double> @buildvec_slideup_not_last_element(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e7) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_not_last_element:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfmv.f.s ft0, v8
+; CHECK-NEXT:    vfmv.v.f v8, fa0
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa1
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa2
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa3
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa4
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT:    vfslide1down.vf v8, v8, ft0
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa6
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %e6 = extractelement <4 x double> %v, i64 0
+  %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+  %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+  ret <8 x double> %v7
+}
+
+define <4 x float> @buildvec_vfredusum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredusum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredusum.vs v8, v8, v16
+; CHECK-NEXT:    vfredusum.vs v9, v10, v16
+; CHECK-NEXT:    vfredusum.vs v10, v12, v16
+; CHECK-NEXT:    vfredusum.vs v11, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v10, v11, 1
+; CHECK-NEXT:    vslideup.vi v9, v10, 1
+; CHECK-NEXT:    vslideup.vi v8, v9, 1
+; CHECK-NEXT:    ret
+  %247 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
+
+define <4 x float> @buildvec_vfredosum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredosum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredosum.vs v8, v8, v16
+; CHECK-NEXT:    vfredosum.vs v9, v10, v16
+; CHECK-NEXT:    vfredosum.vs v10, v12, v16
+; CHECK-NEXT:    vfredosum.vs v11, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v10, v11, 1
+; CHECK-NEXT:    vslideup.vi v9, v10, 1
+; CHECK-NEXT:    vslideup.vi v8, v9, 1
+; CHECK-NEXT:    ret
+  %247 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..6183996579949 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -3416,5 +3416,186 @@ define <4 x i1> @buildvec_i1_splat(i1 %e1) {
   ret <4 x i1> %v4
 }
 
+define <4 x i32> @buildvec_vredsum(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredsum:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vmv.s.x v16, zero
+; RV32-NEXT:    vredsum.vs v8, v8, v16
+; RV32-NEXT:    vredsum.vs v9, v10, v16
+; RV32-NEXT:    vredsum.vs v10, v12, v16
+; RV32-NEXT:    vredsum.vs v11, v14, v16
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT:    vslideup.vi v10, v11, 1
+; RV32-NEXT:    vslideup.vi v9, v10, 1
+; RV32-NEXT:    vslideup.vi v8, v9, 1
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredsum:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vmv.s.x v16, zero
+; RV64V-ONLY-NEXT:    vredsum.vs v8, v8, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v9, v10, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v10, v12, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v11, v14, v16
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT:    vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredsum:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vmv.s.x v16, zero
+; RVA22U64-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredsum:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.s.x v16, zero
+; RVA22U64-PACK-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredsum:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vmv.s.x v16, zero
+; RV64ZVE32-NEXT:    vredsum.vs v8, v8, v16
+; RV64ZVE32-NEXT:    vredsum.vs v9, v10, v16
+; RV64ZVE32-NEXT:    vredsum.vs v10, v12, v16
+; RV64ZVE32-NEXT:    vredsum.vs v11, v14, v16
+; RV64ZVE32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64ZVE32-NEXT:    vslideup.vi v10, v11, 1
+; RV64ZVE32-NEXT:    vslideup.vi v9, v10, 1
+; RV64ZVE32-NEXT:    vslideup.vi v8, v9, 1
+; RV64ZVE32-NEXT:    ret
+  %247 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg0)
+  %248 = insertelement <4 x i32> poison, i32 %247, i64 0
+  %250 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg1)
+  %251 = insertelement <4 x i32> %248, i32 %250, i64 1
+  %252 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg2)
+  %253 = insertelement <4 x i32> %251, i32 %252, i64 2
+  %254 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg3)
+  %255 = insertelement <4 x i32> %253, i32 %254, i64 3
+  ret <4 x i32> %255
+}
+
+define <4 x i32> @buildvec_vredmax(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredmax:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV32-NEXT:    vredmaxu.vs v10, v12, v12
+; RV32-NEXT:    vredmaxu.vs v11, v14, v14
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT:    vslideup.vi v10, v11, 1
+; RV32-NEXT:    vslideup.vi v9, v10, 1
+; RV32-NEXT:    vslideup.vi v8, v9, 1
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredmax:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64V-ONLY-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64V-ONLY-NEXT:    vredmaxu.vs v10, v12, v12
+; RV64V-ONLY-NEXT:    vredmaxu.vs v11, v14, v14
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT:    vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredmax:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredmax:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredmax:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64ZVE32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64ZVE...
[truncated]

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

…x, v, 1) Co-authored-by: Craig Topper <[email protected]>

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

lukel97 · 2025-09-03T07:04:22Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+    // This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
+    if (N->getOperand(0).isUndef() &&
+        sd_match(N->getOperand(2),
+                 m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()),


Is it possible to extract from a srcvec where the element size is smaller? IIRC extractelt also extends the result if needed.

Can we get an extending extractelt from LLVM IR via a extractelement + zext?

Does extractelt have implicit zext/sext like loads? If it's extracting from, say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.

say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.

Well, I was only half-right about that: for extractelt generated from IR or any normal ways, that's the case. But for extractelt generated during legalization no additional zext/sext would be added because they're subject to be lowered into vmv.x.s which sign-extends its result.
I, therefore, added a check to make sure the element type of SrcVecis the same as that of vslide1up

preames

LGTM

lukel97

LGTM

mshockwave requested review from lukel97, preames, topperc and wangpc-pp August 21, 2025 21:36

llvmbot added the backend:RISC-V label Aug 21, 2025

mshockwave mentioned this pull request Aug 26, 2025

[RISCV] Use slideup to lower build_vector when all operand are (extract_element X, 0) #154450

Merged

mshockwave force-pushed the patch/rvv/vslide1up-simplify branch from eea9382 to 5a4ba8d Compare August 30, 2025 00:07

lukel97 reviewed Sep 2, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

mshockwave and others added 2 commits September 2, 2025 09:54

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup …

d06b1eb

…x, v, 1) Co-authored-by: Craig Topper <[email protected]>

fixup! Update tests

2a33925

mshockwave force-pushed the patch/rvv/vslide1up-simplify branch from 5a4ba8d to 2a33925 Compare September 2, 2025 17:20

fixup! Remove the one-use condition

1842e03

lukel97 reviewed Sep 3, 2025

View reviewed changes

fixup! Address review comments

7ea3c5d

preames reviewed Sep 3, 2025

View reviewed changes

preames approved these changes Sep 3, 2025

View reviewed changes

lukel97 approved these changes Sep 4, 2025

View reviewed changes

mshockwave merged commit a4104ab into llvm:main Sep 4, 2025
9 checks passed

mshockwave deleted the patch/rvv/vslide1up-simplify branch September 4, 2025 04:20

mshockwave mentioned this pull request Sep 17, 2025

[RISCV] Disable slideup optimization on the inconsistent element type of EVec and ContainerVT #159373

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

Uh oh!

mshockwave commented Aug 21, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 Sep 3, 2025

Uh oh!

mshockwave Sep 3, 2025

Uh oh!

mshockwave Sep 3, 2025

Uh oh!

preames left a comment

Uh oh!

lukel97 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

Uh oh!

Conversation

mshockwave commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mshockwave Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mshockwave Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mshockwave commented Aug 21, 2025 •

edited

Loading