Skip to content

Conversation

mshockwave
Copy link
Member

@mshockwave mshockwave commented Aug 21, 2025

To a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).


Stacked on top of #154450 (mostly reusing its tests)

We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the wild.

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

To a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).


Stacked on top of #154450 (mostly reusing its tests)

We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the while.


Patch is 21.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154847.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+98-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+123)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+181)
  • (modified) llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll (+3-4)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4a1db80076530..12a9c57ac15ae 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4512,33 +4512,88 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
          "Illegal type which will result in reserved encoding");
 
   const unsigned Policy = RISCVVType::TAIL_AGNOSTIC | RISCVVType::MASK_AGNOSTIC;
+  auto getVSlide = [&](bool SlideUp, EVT ContainerVT, SDValue Passthru,
+                       SDValue Vec, SDValue Offset, SDValue Mask,
+                       SDValue VL) -> SDValue {
+    if (SlideUp)
+      return getVSlideup(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+    return getVSlidedown(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+  };
+
+  // General case: splat the first operand and slide other operands down one
+  // by one to form a vector. Alternatively, if the last operand is an
+  // extraction from element 0 of a vector, we can use that vector as the start
+  // value and slide up instead of slide down. Such that we can avoid the splat.
+  SmallVector<SDValue> Operands(Op->op_begin(), Op->op_end());
+  SDValue EVec;
+  bool SlideUp = false;
+  // Find the first first non-undef from the tail.
+  auto ItLastNonUndef = find_if(Operands.rbegin(), Operands.rend(),
+                                [](SDValue V) { return !V.isUndef(); });
+  if (ItLastNonUndef != Operands.rend()) {
+    using namespace SDPatternMatch;
+    // Check if the last non-undef operand was an extraction.
+    SlideUp = sd_match(*ItLastNonUndef, m_ExtractElt(m_Value(EVec), m_Zero()));
+  }
+
+  if (SlideUp) {
+    MVT EVecContainerVT = EVec.getSimpleValueType();
+    // Make sure the original vector has scalable vector type.
+    if (EVecContainerVT.isFixedLengthVector()) {
+      EVecContainerVT =
+          getContainerForFixedLengthVector(DAG, EVecContainerVT, Subtarget);
+      EVec = convertToScalableVector(EVecContainerVT, EVec, DAG, Subtarget);
+    }
+
+    // Adapt EVec's type into ContainerVT.
+    if (EVecContainerVT.getVectorMinNumElements() <
+        ContainerVT.getVectorMinNumElements())
+      EVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(ContainerVT), EVec, 0);
+    else
+      EVec = DAG.getExtractSubvector(DL, ContainerVT, EVec, 0);
+
+    // Reverse the elements as we're going to slide up from the last element.
+    std::reverse(Operands.begin(), Operands.end());
+  }
 
   SDValue Vec;
   UndefCount = 0;
-  for (SDValue V : Op->ops()) {
+  for (SDValue V : Operands) {
     if (V.isUndef()) {
       UndefCount++;
       continue;
     }
 
-    // Start our sequence with a TA splat in the hopes that hardware is able to
-    // recognize there's no dependency on the prior value of our temporary
-    // register.
+    // Start our sequence with either a TA splat or extract source in the
+    // hopes that hardware is able to recognize there's no dependency on the
+    // prior value of our temporary register.
     if (!Vec) {
-      Vec = DAG.getSplatVector(VT, DL, V);
-      Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      if (SlideUp) {
+        Vec = EVec;
+      } else {
+        Vec = DAG.getSplatVector(VT, DL, V);
+        Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      }
+
       UndefCount = 0;
       continue;
     }
 
     if (UndefCount) {
       const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-      Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                          Vec, Offset, Mask, VL, Policy);
+      Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                      Offset, Mask, VL);
       UndefCount = 0;
     }
-    auto OpCode =
-      VT.isFloatingPoint() ? RISCVISD::VFSLIDE1DOWN_VL : RISCVISD::VSLIDE1DOWN_VL;
+
+    unsigned OpCode;
+    if (VT.isFloatingPoint())
+      OpCode = SlideUp ? RISCVISD::VFSLIDE1UP_VL : RISCVISD::VFSLIDE1DOWN_VL;
+    else
+      OpCode = SlideUp ? RISCVISD::VSLIDE1UP_VL : RISCVISD::VSLIDE1DOWN_VL;
+
     if (!VT.isFloatingPoint())
       V = DAG.getNode(ISD::ANY_EXTEND, DL, Subtarget.getXLenVT(), V);
     Vec = DAG.getNode(OpCode, DL, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
@@ -4546,8 +4601,8 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
   }
   if (UndefCount) {
     const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-    Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                        Vec, Offset, Mask, VL, Policy);
+    Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                    Offset, Mask, VL);
   }
   return convertFromScalableVector(VT, Vec, DAG, Subtarget);
 }
@@ -21054,6 +21109,37 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
       return N->getOperand(0);
     break;
   }
+  case RISCVISD::VSLIDE1UP_VL:
+  case RISCVISD::VFSLIDE1UP_VL: {
+    using namespace SDPatternMatch;
+    SDValue SrcVec;
+    SDLoc DL(N);
+    MVT VT = N->getSimpleValueType(0);
+    // If the scalar we're sliding in was extracted from the first element of a
+    // vector, we can use that vector as the passthru in a normal slideup of 1.
+    // This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
+    if (N->getOperand(0).isUndef() &&
+        sd_match(
+            N->getOperand(2),
+            m_OneUse(m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()),
+                             m_Node(RISCVISD::VMV_X_S, m_Value(SrcVec)))))) {
+      MVT SrcVecVT = SrcVec.getSimpleValueType();
+      // Adapt the value type of source vector.
+      if (SrcVecVT.isFixedLengthVector()) {
+        SrcVecVT = getContainerForFixedLengthVector(SrcVecVT);
+        SrcVec = convertToScalableVector(SrcVecVT, SrcVec, DAG, Subtarget);
+      }
+      if (SrcVecVT.getVectorMinNumElements() < VT.getVectorMinNumElements())
+        SrcVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(VT), SrcVec, 0);
+      else
+        SrcVec = DAG.getExtractSubvector(DL, VT, SrcVec, 0);
+
+      return getVSlideup(DAG, Subtarget, DL, VT, SrcVec, N->getOperand(1),
+                         DAG.getConstant(1, DL, XLenVT), N->getOperand(3),
+                         N->getOperand(4));
+    }
+    break;
+  }
   }
 
   return SDValue();
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 3c3e08d387faa..b62d0607d048c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1828,3 +1828,126 @@ define <8 x double> @buildvec_v8f64_zvl512(double %e0, double %e1, double %e2, d
   %v7 = insertelement <8 x double> %v6, double %e7, i64 7
   ret <8 x double> %v7
 }
+
+define <8 x double> @buildvec_slideup(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e6) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa6
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa5
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT:    vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT:    vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+  %e7 = extractelement <4 x double> %v, i64 0
+  %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+  ret <8 x double> %v7
+}
+
+define <8 x double> @buildvec_slideup_trailing_undef(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_trailing_undef:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT:    vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT:    vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %e5 = extractelement <4 x double> %v, i64 0
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %v6 = insertelement <8 x double> %v5, double poison, i64 6
+  %v7 = insertelement <8 x double> %v6, double poison, i64 7
+  ret <8 x double> %v7
+}
+
+; Negative test for slideup lowering where the extract_element was not build_vector's last operand.
+define <8 x double> @buildvec_slideup_not_last_element(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e7) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_not_last_element:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT:    vfmv.f.s ft0, v8
+; CHECK-NEXT:    vfmv.v.f v8, fa0
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa1
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa2
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa3
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa4
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT:    vfslide1down.vf v8, v8, ft0
+; CHECK-NEXT:    vfslide1down.vf v8, v8, fa6
+; CHECK-NEXT:    ret
+  %v0 = insertelement <8 x double> poison, double %e0, i64 0
+  %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+  %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+  %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+  %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+  %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+  %e6 = extractelement <4 x double> %v, i64 0
+  %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+  %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+  ret <8 x double> %v7
+}
+
+define <4 x float> @buildvec_vfredusum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredusum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredusum.vs v8, v8, v16
+; CHECK-NEXT:    vfredusum.vs v9, v10, v16
+; CHECK-NEXT:    vfredusum.vs v10, v12, v16
+; CHECK-NEXT:    vfredusum.vs v11, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v10, v11, 1
+; CHECK-NEXT:    vslideup.vi v9, v10, 1
+; CHECK-NEXT:    vslideup.vi v8, v9, 1
+; CHECK-NEXT:    ret
+  %247 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
+
+define <4 x float> @buildvec_vfredosum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredosum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredosum.vs v8, v8, v16
+; CHECK-NEXT:    vfredosum.vs v9, v10, v16
+; CHECK-NEXT:    vfredosum.vs v10, v12, v16
+; CHECK-NEXT:    vfredosum.vs v11, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v10, v11, 1
+; CHECK-NEXT:    vslideup.vi v9, v10, 1
+; CHECK-NEXT:    vslideup.vi v8, v9, 1
+; CHECK-NEXT:    ret
+  %247 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..6183996579949 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -3416,5 +3416,186 @@ define <4 x i1> @buildvec_i1_splat(i1 %e1) {
   ret <4 x i1> %v4
 }
 
+define <4 x i32> @buildvec_vredsum(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredsum:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vmv.s.x v16, zero
+; RV32-NEXT:    vredsum.vs v8, v8, v16
+; RV32-NEXT:    vredsum.vs v9, v10, v16
+; RV32-NEXT:    vredsum.vs v10, v12, v16
+; RV32-NEXT:    vredsum.vs v11, v14, v16
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT:    vslideup.vi v10, v11, 1
+; RV32-NEXT:    vslideup.vi v9, v10, 1
+; RV32-NEXT:    vslideup.vi v8, v9, 1
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredsum:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vmv.s.x v16, zero
+; RV64V-ONLY-NEXT:    vredsum.vs v8, v8, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v9, v10, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v10, v12, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v11, v14, v16
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT:    vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredsum:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vmv.s.x v16, zero
+; RVA22U64-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredsum:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.s.x v16, zero
+; RVA22U64-PACK-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredsum:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vmv.s.x v16, zero
+; RV64ZVE32-NEXT:    vredsum.vs v8, v8, v16
+; RV64ZVE32-NEXT:    vredsum.vs v9, v10, v16
+; RV64ZVE32-NEXT:    vredsum.vs v10, v12, v16
+; RV64ZVE32-NEXT:    vredsum.vs v11, v14, v16
+; RV64ZVE32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64ZVE32-NEXT:    vslideup.vi v10, v11, 1
+; RV64ZVE32-NEXT:    vslideup.vi v9, v10, 1
+; RV64ZVE32-NEXT:    vslideup.vi v8, v9, 1
+; RV64ZVE32-NEXT:    ret
+  %247 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg0)
+  %248 = insertelement <4 x i32> poison, i32 %247, i64 0
+  %250 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg1)
+  %251 = insertelement <4 x i32> %248, i32 %250, i64 1
+  %252 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg2)
+  %253 = insertelement <4 x i32> %251, i32 %252, i64 2
+  %254 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg3)
+  %255 = insertelement <4 x i32> %253, i32 %254, i64 3
+  ret <4 x i32> %255
+}
+
+define <4 x i32> @buildvec_vredmax(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredmax:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV32-NEXT:    vredmaxu.vs v10, v12, v12
+; RV32-NEXT:    vredmaxu.vs v11, v14, v14
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT:    vslideup.vi v10, v11, 1
+; RV32-NEXT:    vslideup.vi v9, v10, 1
+; RV32-NEXT:    vslideup.vi v8, v9, 1
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredmax:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64V-ONLY-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64V-ONLY-NEXT:    vredmaxu.vs v10, v12, v12
+; RV64V-ONLY-NEXT:    vredmaxu.vs v11, v14, v14
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT:    vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT:    vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredmax:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredmax:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredmax:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64ZVE32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64ZVE...
[truncated]

@mshockwave mshockwave force-pushed the patch/rvv/vslide1up-simplify branch from 5a4ba8d to 2a33925 Compare September 2, 2025 17:20
// This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
if (N->getOperand(0).isUndef() &&
sd_match(N->getOperand(2),
m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to extract from a srcvec where the element size is smaller? IIRC extractelt also extends the result if needed.

Can we get an extending extractelt from LLVM IR via a extractelement + zext?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does extractelt have implicit zext/sext like loads? If it's extracting from, say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.

Well, I was only half-right about that: for extractelt generated from IR or any normal ways, that's the case. But for extractelt generated during legalization no additional zext/sext would be added because they're subject to be lowered into vmv.x.s which sign-extends its result.
I, therefore, added a check to make sure the element type of SrcVecis the same as that of vslide1up

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mshockwave mshockwave merged commit a4104ab into llvm:main Sep 4, 2025
9 checks passed
@mshockwave mshockwave deleted the patch/rvv/vslide1up-simplify branch September 4, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants