-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesTo a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s). Stacked on top of #154450 (mostly reusing its tests) We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the while. Patch is 21.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154847.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4a1db80076530..12a9c57ac15ae 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4512,33 +4512,88 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
"Illegal type which will result in reserved encoding");
const unsigned Policy = RISCVVType::TAIL_AGNOSTIC | RISCVVType::MASK_AGNOSTIC;
+ auto getVSlide = [&](bool SlideUp, EVT ContainerVT, SDValue Passthru,
+ SDValue Vec, SDValue Offset, SDValue Mask,
+ SDValue VL) -> SDValue {
+ if (SlideUp)
+ return getVSlideup(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+ Mask, VL, Policy);
+ return getVSlidedown(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+ Mask, VL, Policy);
+ };
+
+ // General case: splat the first operand and slide other operands down one
+ // by one to form a vector. Alternatively, if the last operand is an
+ // extraction from element 0 of a vector, we can use that vector as the start
+ // value and slide up instead of slide down. Such that we can avoid the splat.
+ SmallVector<SDValue> Operands(Op->op_begin(), Op->op_end());
+ SDValue EVec;
+ bool SlideUp = false;
+ // Find the first first non-undef from the tail.
+ auto ItLastNonUndef = find_if(Operands.rbegin(), Operands.rend(),
+ [](SDValue V) { return !V.isUndef(); });
+ if (ItLastNonUndef != Operands.rend()) {
+ using namespace SDPatternMatch;
+ // Check if the last non-undef operand was an extraction.
+ SlideUp = sd_match(*ItLastNonUndef, m_ExtractElt(m_Value(EVec), m_Zero()));
+ }
+
+ if (SlideUp) {
+ MVT EVecContainerVT = EVec.getSimpleValueType();
+ // Make sure the original vector has scalable vector type.
+ if (EVecContainerVT.isFixedLengthVector()) {
+ EVecContainerVT =
+ getContainerForFixedLengthVector(DAG, EVecContainerVT, Subtarget);
+ EVec = convertToScalableVector(EVecContainerVT, EVec, DAG, Subtarget);
+ }
+
+ // Adapt EVec's type into ContainerVT.
+ if (EVecContainerVT.getVectorMinNumElements() <
+ ContainerVT.getVectorMinNumElements())
+ EVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(ContainerVT), EVec, 0);
+ else
+ EVec = DAG.getExtractSubvector(DL, ContainerVT, EVec, 0);
+
+ // Reverse the elements as we're going to slide up from the last element.
+ std::reverse(Operands.begin(), Operands.end());
+ }
SDValue Vec;
UndefCount = 0;
- for (SDValue V : Op->ops()) {
+ for (SDValue V : Operands) {
if (V.isUndef()) {
UndefCount++;
continue;
}
- // Start our sequence with a TA splat in the hopes that hardware is able to
- // recognize there's no dependency on the prior value of our temporary
- // register.
+ // Start our sequence with either a TA splat or extract source in the
+ // hopes that hardware is able to recognize there's no dependency on the
+ // prior value of our temporary register.
if (!Vec) {
- Vec = DAG.getSplatVector(VT, DL, V);
- Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+ if (SlideUp) {
+ Vec = EVec;
+ } else {
+ Vec = DAG.getSplatVector(VT, DL, V);
+ Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+ }
+
UndefCount = 0;
continue;
}
if (UndefCount) {
const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
- Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
- Vec, Offset, Mask, VL, Policy);
+ Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+ Offset, Mask, VL);
UndefCount = 0;
}
- auto OpCode =
- VT.isFloatingPoint() ? RISCVISD::VFSLIDE1DOWN_VL : RISCVISD::VSLIDE1DOWN_VL;
+
+ unsigned OpCode;
+ if (VT.isFloatingPoint())
+ OpCode = SlideUp ? RISCVISD::VFSLIDE1UP_VL : RISCVISD::VFSLIDE1DOWN_VL;
+ else
+ OpCode = SlideUp ? RISCVISD::VSLIDE1UP_VL : RISCVISD::VSLIDE1DOWN_VL;
+
if (!VT.isFloatingPoint())
V = DAG.getNode(ISD::ANY_EXTEND, DL, Subtarget.getXLenVT(), V);
Vec = DAG.getNode(OpCode, DL, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
@@ -4546,8 +4601,8 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
}
if (UndefCount) {
const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
- Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
- Vec, Offset, Mask, VL, Policy);
+ Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+ Offset, Mask, VL);
}
return convertFromScalableVector(VT, Vec, DAG, Subtarget);
}
@@ -21054,6 +21109,37 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
return N->getOperand(0);
break;
}
+ case RISCVISD::VSLIDE1UP_VL:
+ case RISCVISD::VFSLIDE1UP_VL: {
+ using namespace SDPatternMatch;
+ SDValue SrcVec;
+ SDLoc DL(N);
+ MVT VT = N->getSimpleValueType(0);
+ // If the scalar we're sliding in was extracted from the first element of a
+ // vector, we can use that vector as the passthru in a normal slideup of 1.
+ // This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
+ if (N->getOperand(0).isUndef() &&
+ sd_match(
+ N->getOperand(2),
+ m_OneUse(m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()),
+ m_Node(RISCVISD::VMV_X_S, m_Value(SrcVec)))))) {
+ MVT SrcVecVT = SrcVec.getSimpleValueType();
+ // Adapt the value type of source vector.
+ if (SrcVecVT.isFixedLengthVector()) {
+ SrcVecVT = getContainerForFixedLengthVector(SrcVecVT);
+ SrcVec = convertToScalableVector(SrcVecVT, SrcVec, DAG, Subtarget);
+ }
+ if (SrcVecVT.getVectorMinNumElements() < VT.getVectorMinNumElements())
+ SrcVec = DAG.getInsertSubvector(DL, DAG.getUNDEF(VT), SrcVec, 0);
+ else
+ SrcVec = DAG.getExtractSubvector(DL, VT, SrcVec, 0);
+
+ return getVSlideup(DAG, Subtarget, DL, VT, SrcVec, N->getOperand(1),
+ DAG.getConstant(1, DL, XLenVT), N->getOperand(3),
+ N->getOperand(4));
+ }
+ break;
+ }
}
return SDValue();
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 3c3e08d387faa..b62d0607d048c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1828,3 +1828,126 @@ define <8 x double> @buildvec_v8f64_zvl512(double %e0, double %e1, double %e2, d
%v7 = insertelement <8 x double> %v6, double %e7, i64 7
ret <8 x double> %v7
}
+
+define <8 x double> @buildvec_slideup(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e6) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT: vfslide1up.vf v10, v8, fa6
+; CHECK-NEXT: vfslide1up.vf v8, v10, fa5
+; CHECK-NEXT: vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT: vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT: vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT: vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT: vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT: ret
+ %v0 = insertelement <8 x double> poison, double %e0, i64 0
+ %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+ %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+ %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+ %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+ %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+ %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+ %e7 = extractelement <4 x double> %v, i64 0
+ %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+ ret <8 x double> %v7
+}
+
+define <8 x double> @buildvec_slideup_trailing_undef(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_trailing_undef:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT: vfslide1up.vf v10, v8, fa4
+; CHECK-NEXT: vfslide1up.vf v8, v10, fa3
+; CHECK-NEXT: vfslide1up.vf v10, v8, fa2
+; CHECK-NEXT: vfslide1up.vf v12, v10, fa1
+; CHECK-NEXT: vfslide1up.vf v8, v12, fa0
+; CHECK-NEXT: ret
+ %v0 = insertelement <8 x double> poison, double %e0, i64 0
+ %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+ %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+ %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+ %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+ %e5 = extractelement <4 x double> %v, i64 0
+ %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+ %v6 = insertelement <8 x double> %v5, double poison, i64 6
+ %v7 = insertelement <8 x double> %v6, double poison, i64 7
+ ret <8 x double> %v7
+}
+
+; Negative test for slideup lowering where the extract_element was not build_vector's last operand.
+define <8 x double> @buildvec_slideup_not_last_element(<4 x double> %v, double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e7) vscale_range(4, 128) {
+; CHECK-LABEL: buildvec_slideup_not_last_element:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 8, e64, m2, ta, ma
+; CHECK-NEXT: vfmv.f.s ft0, v8
+; CHECK-NEXT: vfmv.v.f v8, fa0
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa1
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa2
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa3
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa4
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT: vfslide1down.vf v8, v8, ft0
+; CHECK-NEXT: vfslide1down.vf v8, v8, fa6
+; CHECK-NEXT: ret
+ %v0 = insertelement <8 x double> poison, double %e0, i64 0
+ %v1 = insertelement <8 x double> %v0, double %e1, i64 1
+ %v2 = insertelement <8 x double> %v1, double %e2, i64 2
+ %v3 = insertelement <8 x double> %v2, double %e3, i64 3
+ %v4 = insertelement <8 x double> %v3, double %e4, i64 4
+ %v5 = insertelement <8 x double> %v4, double %e5, i64 5
+ %e6 = extractelement <4 x double> %v, i64 0
+ %v6 = insertelement <8 x double> %v5, double %e6, i64 6
+ %v7 = insertelement <8 x double> %v6, double %e7, i64 7
+ ret <8 x double> %v7
+}
+
+define <4 x float> @buildvec_vfredusum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredusum:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vfmv.s.f v16, fa0
+; CHECK-NEXT: vfredusum.vs v8, v8, v16
+; CHECK-NEXT: vfredusum.vs v9, v10, v16
+; CHECK-NEXT: vfredusum.vs v10, v12, v16
+; CHECK-NEXT: vfredusum.vs v11, v14, v16
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT: vslideup.vi v10, v11, 1
+; CHECK-NEXT: vslideup.vi v9, v10, 1
+; CHECK-NEXT: vslideup.vi v8, v9, 1
+; CHECK-NEXT: ret
+ %247 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+ %248 = insertelement <4 x float> poison, float %247, i64 0
+ %250 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+ %251 = insertelement <4 x float> %248, float %250, i64 1
+ %252 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+ %253 = insertelement <4 x float> %251, float %252, i64 2
+ %254 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+ %255 = insertelement <4 x float> %253, float %254, i64 3
+ ret <4 x float> %255
+}
+
+define <4 x float> @buildvec_vfredosum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredosum:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vfmv.s.f v16, fa0
+; CHECK-NEXT: vfredosum.vs v8, v8, v16
+; CHECK-NEXT: vfredosum.vs v9, v10, v16
+; CHECK-NEXT: vfredosum.vs v10, v12, v16
+; CHECK-NEXT: vfredosum.vs v11, v14, v16
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT: vslideup.vi v10, v11, 1
+; CHECK-NEXT: vslideup.vi v9, v10, 1
+; CHECK-NEXT: vslideup.vi v8, v9, 1
+; CHECK-NEXT: ret
+ %247 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+ %248 = insertelement <4 x float> poison, float %247, i64 0
+ %250 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+ %251 = insertelement <4 x float> %248, float %250, i64 1
+ %252 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+ %253 = insertelement <4 x float> %251, float %252, i64 2
+ %254 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+ %255 = insertelement <4 x float> %253, float %254, i64 3
+ ret <4 x float> %255
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..6183996579949 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -3416,5 +3416,186 @@ define <4 x i1> @buildvec_i1_splat(i1 %e1) {
ret <4 x i1> %v4
}
+define <4 x i32> @buildvec_vredsum(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredsum:
+; RV32: # %bb.0:
+; RV32-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT: vmv.s.x v16, zero
+; RV32-NEXT: vredsum.vs v8, v8, v16
+; RV32-NEXT: vredsum.vs v9, v10, v16
+; RV32-NEXT: vredsum.vs v10, v12, v16
+; RV32-NEXT: vredsum.vs v11, v14, v16
+; RV32-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT: vslideup.vi v10, v11, 1
+; RV32-NEXT: vslideup.vi v9, v10, 1
+; RV32-NEXT: vslideup.vi v8, v9, 1
+; RV32-NEXT: ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredsum:
+; RV64V-ONLY: # %bb.0:
+; RV64V-ONLY-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT: vmv.s.x v16, zero
+; RV64V-ONLY-NEXT: vredsum.vs v8, v8, v16
+; RV64V-ONLY-NEXT: vredsum.vs v9, v10, v16
+; RV64V-ONLY-NEXT: vredsum.vs v10, v12, v16
+; RV64V-ONLY-NEXT: vredsum.vs v11, v14, v16
+; RV64V-ONLY-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT: vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT: vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT: vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT: ret
+;
+; RVA22U64-LABEL: buildvec_vredsum:
+; RVA22U64: # %bb.0:
+; RVA22U64-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT: vmv.s.x v16, zero
+; RVA22U64-NEXT: vredsum.vs v8, v8, v16
+; RVA22U64-NEXT: vredsum.vs v9, v10, v16
+; RVA22U64-NEXT: vredsum.vs v10, v12, v16
+; RVA22U64-NEXT: vredsum.vs v11, v14, v16
+; RVA22U64-NEXT: vmv.x.s a0, v8
+; RVA22U64-NEXT: vmv.x.s a1, v9
+; RVA22U64-NEXT: vmv.x.s a2, v10
+; RVA22U64-NEXT: slli a1, a1, 32
+; RVA22U64-NEXT: add.uw a0, a0, a1
+; RVA22U64-NEXT: vmv.x.s a1, v11
+; RVA22U64-NEXT: slli a1, a1, 32
+; RVA22U64-NEXT: add.uw a1, a2, a1
+; RVA22U64-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT: vmv.v.x v8, a0
+; RVA22U64-NEXT: vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT: ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredsum:
+; RVA22U64-PACK: # %bb.0:
+; RVA22U64-PACK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT: vmv.s.x v16, zero
+; RVA22U64-PACK-NEXT: vredsum.vs v8, v8, v16
+; RVA22U64-PACK-NEXT: vredsum.vs v9, v10, v16
+; RVA22U64-PACK-NEXT: vredsum.vs v10, v12, v16
+; RVA22U64-PACK-NEXT: vredsum.vs v11, v14, v16
+; RVA22U64-PACK-NEXT: vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT: vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT: vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT: pack a0, a0, a1
+; RVA22U64-PACK-NEXT: vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT: pack a1, a2, a1
+; RVA22U64-PACK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT: vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT: vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT: ret
+;
+; RV64ZVE32-LABEL: buildvec_vredsum:
+; RV64ZVE32: # %bb.0:
+; RV64ZVE32-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT: vmv.s.x v16, zero
+; RV64ZVE32-NEXT: vredsum.vs v8, v8, v16
+; RV64ZVE32-NEXT: vredsum.vs v9, v10, v16
+; RV64ZVE32-NEXT: vredsum.vs v10, v12, v16
+; RV64ZVE32-NEXT: vredsum.vs v11, v14, v16
+; RV64ZVE32-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; RV64ZVE32-NEXT: vslideup.vi v10, v11, 1
+; RV64ZVE32-NEXT: vslideup.vi v9, v10, 1
+; RV64ZVE32-NEXT: vslideup.vi v8, v9, 1
+; RV64ZVE32-NEXT: ret
+ %247 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg0)
+ %248 = insertelement <4 x i32> poison, i32 %247, i64 0
+ %250 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg1)
+ %251 = insertelement <4 x i32> %248, i32 %250, i64 1
+ %252 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg2)
+ %253 = insertelement <4 x i32> %251, i32 %252, i64 2
+ %254 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg3)
+ %255 = insertelement <4 x i32> %253, i32 %254, i64 3
+ ret <4 x i32> %255
+}
+
+define <4 x i32> @buildvec_vredmax(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredmax:
+; RV32: # %bb.0:
+; RV32-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT: vredmaxu.vs v8, v8, v8
+; RV32-NEXT: vredmaxu.vs v9, v10, v10
+; RV32-NEXT: vredmaxu.vs v10, v12, v12
+; RV32-NEXT: vredmaxu.vs v11, v14, v14
+; RV32-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; RV32-NEXT: vslideup.vi v10, v11, 1
+; RV32-NEXT: vslideup.vi v9, v10, 1
+; RV32-NEXT: vslideup.vi v8, v9, 1
+; RV32-NEXT: ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredmax:
+; RV64V-ONLY: # %bb.0:
+; RV64V-ONLY-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT: vredmaxu.vs v8, v8, v8
+; RV64V-ONLY-NEXT: vredmaxu.vs v9, v10, v10
+; RV64V-ONLY-NEXT: vredmaxu.vs v10, v12, v12
+; RV64V-ONLY-NEXT: vredmaxu.vs v11, v14, v14
+; RV64V-ONLY-NEXT: vsetivli zero, 4, e32, m1, tu, ma
+; RV64V-ONLY-NEXT: vslideup.vi v10, v11, 1
+; RV64V-ONLY-NEXT: vslideup.vi v9, v10, 1
+; RV64V-ONLY-NEXT: vslideup.vi v8, v9, 1
+; RV64V-ONLY-NEXT: ret
+;
+; RVA22U64-LABEL: buildvec_vredmax:
+; RVA22U64: # %bb.0:
+; RVA22U64-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT: vredmaxu.vs v8, v8, v8
+; RVA22U64-NEXT: vredmaxu.vs v9, v10, v10
+; RVA22U64-NEXT: vredmaxu.vs v10, v12, v12
+; RVA22U64-NEXT: vredmaxu.vs v11, v14, v14
+; RVA22U64-NEXT: vmv.x.s a0, v8
+; RVA22U64-NEXT: vmv.x.s a1, v9
+; RVA22U64-NEXT: vmv.x.s a2, v10
+; RVA22U64-NEXT: slli a1, a1, 32
+; RVA22U64-NEXT: add.uw a0, a0, a1
+; RVA22U64-NEXT: vmv.x.s a1, v11
+; RVA22U64-NEXT: slli a1, a1, 32
+; RVA22U64-NEXT: add.uw a1, a2, a1
+; RVA22U64-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT: vmv.v.x v8, a0
+; RVA22U64-NEXT: vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT: ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredmax:
+; RVA22U64-PACK: # %bb.0:
+; RVA22U64-PACK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT: vredmaxu.vs v8, v8, v8
+; RVA22U64-PACK-NEXT: vredmaxu.vs v9, v10, v10
+; RVA22U64-PACK-NEXT: vredmaxu.vs v10, v12, v12
+; RVA22U64-PACK-NEXT: vredmaxu.vs v11, v14, v14
+; RVA22U64-PACK-NEXT: vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT: vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT: vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT: pack a0, a0, a1
+; RVA22U64-PACK-NEXT: vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT: pack a1, a2, a1
+; RVA22U64-PACK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT: vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT: vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT: ret
+;
+; RV64ZVE32-LABEL: buildvec_vredmax:
+; RV64ZVE32: # %bb.0:
+; RV64ZVE32-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT: vredmaxu.vs v8, v8, v8
+; RV64ZVE32-NEXT: vredmaxu.vs v9, v10, v10
+; RV64ZVE...
[truncated]
|
eea9382
to
5a4ba8d
Compare
…x, v, 1) Co-authored-by: Craig Topper <[email protected]>
5a4ba8d
to
2a33925
Compare
// This saves us an extract_element instruction (i.e. vfmv.f.s, vmv.x.s). | ||
if (N->getOperand(0).isUndef() && | ||
sd_match(N->getOperand(2), | ||
m_AnyOf(m_ExtractElt(m_Value(SrcVec), m_Zero()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to extract from a srcvec where the element size is smaller? IIRC extractelt also extends the result if needed.
Can we get an extending extractelt from LLVM IR via a extractelement + zext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does extractelt have implicit zext/sext like loads? If it's extracting from, say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say a i32 vector to a i64 scalar, I think an additional zext SDNode will be applied on that i64 before any use.
Well, I was only half-right about that: for extractelt generated from IR or any normal ways, that's the case. But for extractelt generated during legalization no additional zext/sext would be added because they're subject to be lowered into vmv.x.s which sign-extends its result.
I, therefore, added a check to make sure the element type of SrcVecis the same as that of vslide1up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
To a slide1up, if the scalar value we're sliding in was extracted from the first element of a vector, we can use a normal vslideup of 1 instead with its passthru being that vector. This can eliminate an extract_element instruction (i.e. vfmv.f.s, vmv.x.s).
Stacked on top of #154450 (mostly reusing its tests)
We might be able to do a similar thing on vslide1down / vslidedown -- for constant VL, at least. In which the new vslidedown will have one less VL than its original VL; mask also needs to be constant. But I haven't seen cases like that in the wild.