-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic #158637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic #158637
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-risc-v Author: Sander de Smalen (sdesmalen-arm) ChangesThe partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change. Patch is 254.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158637.diff 33 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..61c8415873092 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20562,7 +20562,7 @@ Note that it has the following implications:
- If ``%cnt`` is non-zero, the return value is non-zero as well.
- If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
-'``llvm.experimental.vector.partial.reduce.add.*``' Intrinsic
+'``llvm.vector.partial.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20571,15 +20571,15 @@ This is an overloaded intrinsic.
::
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
Overview:
"""""""""
-The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
+The '``llvm.vector.partial.reduce.add.*``' intrinsics reduce the
concatenation of the two vector arguments down to the number of elements of the
result vector type.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index a6f4e51e258ab..31d6cd676f674 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1324,7 +1324,7 @@ class TargetTransformInfo {
/// \return The cost of a partial reduction, which is a reduction from a
/// vector to another vector with fewer elements of larger size. They are
- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which
+ /// represented by the llvm.vector.partial.reduce.add intrinsic, which
/// takes an accumulator of type \p AccumType and a second vector operand to
/// be accumulated, whose element count is specified by \p VF. The type of
/// reduction is specified by \p Opcode. The second operand passed to the
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2ba8b29e775e0..46be271320fdd 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -480,7 +480,7 @@ class LLVM_ABI TargetLoweringBase {
return true;
}
- /// Return true if the @llvm.experimental.vector.partial.reduce.* intrinsic
+ /// Return true if the @llvm.vector.partial.reduce.* intrinsic
/// should be expanded using generic code in SelectionDAGBuilder.
virtual bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const {
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index fb9ea10ac9127..585371a6a4423 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2797,9 +2797,9 @@ foreach n = 2...8 in {
//===-------------- Intrinsics to perform partial reduction ---------------===//
-def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
+ [llvm_anyvector_ty, llvm_anyvector_ty],
+ [IntrNoMem]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
index 7d355e6e365d3..6c2a5a7da84d3 100644
--- a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
+++ b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
@@ -1022,8 +1022,7 @@ ComplexDeinterleavingGraph::identifyDotProduct(Value *V) {
CompositeNode *ANode = nullptr;
- const Intrinsic::ID PartialReduceInt =
- Intrinsic::experimental_vector_partial_reduce_add;
+ const Intrinsic::ID PartialReduceInt = Intrinsic::vector_partial_reduce_add;
Value *AReal = nullptr;
Value *AImag = nullptr;
@@ -1139,8 +1138,7 @@ ComplexDeinterleavingGraph::identifyPartialReduction(Value *R, Value *I) {
return nullptr;
auto *IInst = dyn_cast<IntrinsicInst>(*CommonUser);
- if (!IInst || IInst->getIntrinsicID() !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (!IInst || IInst->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return nullptr;
if (CompositeNode *CN = identifyDotProduct(IInst))
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 430e47451fd49..775e1b95e7def 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8107,7 +8107,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I, Trunc);
return;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (!TLI.shouldExpandPartialReductionIntrinsic(cast<IntrinsicInst>(&I))) {
visitTargetIntrinsic(I, Intrinsic);
return;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 8d8120ac9ed90..0eb3bf961b67a 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1259,6 +1259,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
.StartsWith("reverse.", Intrinsic::vector_reverse)
.StartsWith("interleave2.", Intrinsic::vector_interleave2)
.StartsWith("deinterleave2.", Intrinsic::vector_deinterleave2)
+ .StartsWith("partial.reduce.add", Intrinsic::vector_partial_reduce_add)
.Default(Intrinsic::not_intrinsic);
if (ID != Intrinsic::not_intrinsic) {
const auto *FT = F->getFunctionType();
@@ -1269,8 +1270,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
Tys.push_back(FT->getReturnType());
if (ID != Intrinsic::vector_interleave2)
Tys.push_back(FT->getParamType(0));
- if (ID == Intrinsic::vector_insert)
- // Inserting overloads the inserted type.
+ if (ID == Intrinsic::vector_insert ||
+ ID == Intrinsic::vector_partial_reduce_add)
+ // Inserting overloads the inserted type.
Tys.push_back(FT->getParamType(1));
rename(F);
NewFn = Intrinsic::getOrInsertDeclaration(F->getParent(), ID, Tys);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index c06b60fd2d9a9..e9ee130dd5e91 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6530,7 +6530,7 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
}
break;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
VectorType *AccTy = cast<VectorType>(Call.getArgOperand(0)->getType());
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(1)->getType());
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d7c90bcb9723d..ac810fd2d04bf 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2184,8 +2184,7 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- assert(I->getIntrinsicID() ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ assert(I->getIntrinsicID() == Intrinsic::vector_partial_reduce_add &&
"Unexpected intrinsic!");
return true;
}
@@ -17380,8 +17379,7 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
return true;
if (match(SingleUser,
- m_Intrinsic<
- Intrinsic::experimental_vector_partial_reduce_add>(
+ m_Intrinsic<Intrinsic::vector_partial_reduce_add>(
m_Value(), m_Specific(I))))
return true;
return false;
@@ -22416,8 +22414,7 @@ SDValue tryLowerPartialReductionToDot(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
bool Scalable = N->getValueType(0).isScalableVector();
@@ -22511,8 +22508,7 @@ SDValue tryLowerPartialReductionToWideAdd(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
if (!Subtarget->hasSVE2() && !Subtarget->isStreamingSVEAvailable())
@@ -22577,7 +22573,7 @@ static SDValue performIntrinsicCombine(SDNode *N,
switch (IID) {
default:
break;
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (SDValue Dot = tryLowerPartialReductionToDot(N, Subtarget, DAG))
return Dot;
if (SDValue WideAdd = tryLowerPartialReductionToWideAdd(N, Subtarget, DAG))
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index aea27ba32d37e..64b9dc31f75b7 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -418,7 +418,7 @@ MVT WebAssemblyTargetLowering::getPointerMemTy(const DataLayout &DL,
bool WebAssemblyTargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
+ if (I->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return true;
EVT VT = EVT::getEVT(I->getType());
@@ -2117,8 +2117,7 @@ SDValue WebAssemblyTargetLowering::LowerVASTART(SDValue Op,
// extmul and adds.
SDValue performLowerPartialReduction(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN);
- if (N->getConstantOperandVal(0) !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (N->getConstantOperandVal(0) != Intrinsic::vector_partial_reduce_add)
return SDValue();
assert(N->getValueType(0) == MVT::v4i32 && "can only support v4i32");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 723363fba5724..7df84456c9b17 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -375,9 +375,9 @@ void VPPartialReductionRecipe::execute(VPTransformState &State) {
Type *RetTy = PhiVal->getType();
- CallInst *V = Builder.CreateIntrinsic(
- RetTy, Intrinsic::experimental_vector_partial_reduce_add,
- {PhiVal, BinOpVal}, nullptr, "partial.reduce");
+ CallInst *V =
+ Builder.CreateIntrinsic(RetTy, Intrinsic::vector_partial_reduce_add,
+ {PhiVal, BinOpVal}, nullptr, "partial.reduce");
State.set(this, V);
}
diff --git a/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
new file mode 100644
index 0000000000000..1277d5d933d7b
--- /dev/null
+++ b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -S < %s | FileCheck %s
+; RUN: llvm-as %s -o - | llvm-dis | FileCheck %s
+
+define <4 x i32> @partial_reduce_add_fixed(<16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_fixed
+; CHECK: %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+
+ %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+ ret <4 x i32> %res
+}
+
+
+define <vscale x 4 x i32> @partial_reduce_add_scalable(<vscale x 16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_scalable
+; CHECK: %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+
+ %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+ ret <vscale x 4 x i32> %res
+}
+
+declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+; CHECK-DAG: declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+
+declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
+; CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
index ae08cd9d5bfef..3e4a856aed2ec 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
@@ -15,7 +15,7 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $q3
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(<4 x s32>) = COPY $q4
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[COPY1]](<4 x s32>), [[COPY2]](<4 x s32>), [[COPY3]](<4 x s32>), [[COPY4]](<4 x s32>)
- ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
+ ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
; CHECK-NEXT: [[VECREDUCE_ADD:%[0-9]+]]:_(s32) = G_VECREDUCE_ADD [[INT]](<4 x s32>)
; CHECK-NEXT: $w0 = COPY [[VECREDUCE_ADD]](s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -25,7 +25,7 @@ body: |
%4:_(<4 x s32>) = COPY $q3
%5:_(<4 x s32>) = COPY $q4
%1:_(<16 x s32>) = G_CONCAT_VECTORS %2:_(<4 x s32>), %3:_(<4 x s32>), %4:_(<4 x s32>), %5:_(<4 x s32>)
- %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
+ %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
%7:_(s32) = G_VECREDUCE_ADD %6:_(<4 x s32>)
$w0 = COPY %7:_(s32)
RET_ReallyLR implicit $w0
diff --git a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
index 11cf4c31936d8..ebb2da9a3edd2 100644
--- a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
+++ b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
@@ -45,10 +45,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-SVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-SVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-SVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-SVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-SVE: [[MIDDLE_BLOCK]]:
; CHECK-SVE-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -71,10 +71,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-NOSVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-NOSVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-NOSVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-NOSVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-NOSVE: [[MIDDLE_BLOCK]]:
; CHECK-NOSVE-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -96,10 +96,10 @@ vector.body: ; preds = %vector.body, %entry
%b.real.ext = sext <vscale x 16 x i8> %b.real to <vscale x 16 x i32>
%b.imag.ext = sext <vscale x 16 x i8> %b.imag to <vscale x 16 x i32>
%real.mul = mul <vscale x 16 x i32> %b.real.ext, %a.real.ext
- %real.mul.reduced = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
+ %real.mul.reduced = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
%imag.mul = mul <vscale x 16 x i32> %b.imag.ext, %a.imag.ext
%imag.mul.neg = sub <vscale x 16 x i32> zeroinitializer, %imag.mul
- %partial.reduce.sub = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %real.mul.reduced, <vscale x 16 x i32> %imag.mul.neg)
+ %partial.reduce.sub = call <vscale x 4 x i32> @ll...
[truncated]
|
@llvm/pr-subscribers-llvm-analysis Author: Sander de Smalen (sdesmalen-arm) ChangesThe partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change. Patch is 254.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158637.diff 33 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..61c8415873092 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20562,7 +20562,7 @@ Note that it has the following implications:
- If ``%cnt`` is non-zero, the return value is non-zero as well.
- If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
-'``llvm.experimental.vector.partial.reduce.add.*``' Intrinsic
+'``llvm.vector.partial.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20571,15 +20571,15 @@ This is an overloaded intrinsic.
::
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
Overview:
"""""""""
-The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
+The '``llvm.vector.partial.reduce.add.*``' intrinsics reduce the
concatenation of the two vector arguments down to the number of elements of the
result vector type.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index a6f4e51e258ab..31d6cd676f674 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1324,7 +1324,7 @@ class TargetTransformInfo {
/// \return The cost of a partial reduction, which is a reduction from a
/// vector to another vector with fewer elements of larger size. They are
- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which
+ /// represented by the llvm.vector.partial.reduce.add intrinsic, which
/// takes an accumulator of type \p AccumType and a second vector operand to
/// be accumulated, whose element count is specified by \p VF. The type of
/// reduction is specified by \p Opcode. The second operand passed to the
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2ba8b29e775e0..46be271320fdd 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -480,7 +480,7 @@ class LLVM_ABI TargetLoweringBase {
return true;
}
- /// Return true if the @llvm.experimental.vector.partial.reduce.* intrinsic
+ /// Return true if the @llvm.vector.partial.reduce.* intrinsic
/// should be expanded using generic code in SelectionDAGBuilder.
virtual bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const {
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index fb9ea10ac9127..585371a6a4423 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2797,9 +2797,9 @@ foreach n = 2...8 in {
//===-------------- Intrinsics to perform partial reduction ---------------===//
-def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
+ [llvm_anyvector_ty, llvm_anyvector_ty],
+ [IntrNoMem]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
index 7d355e6e365d3..6c2a5a7da84d3 100644
--- a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
+++ b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
@@ -1022,8 +1022,7 @@ ComplexDeinterleavingGraph::identifyDotProduct(Value *V) {
CompositeNode *ANode = nullptr;
- const Intrinsic::ID PartialReduceInt =
- Intrinsic::experimental_vector_partial_reduce_add;
+ const Intrinsic::ID PartialReduceInt = Intrinsic::vector_partial_reduce_add;
Value *AReal = nullptr;
Value *AImag = nullptr;
@@ -1139,8 +1138,7 @@ ComplexDeinterleavingGraph::identifyPartialReduction(Value *R, Value *I) {
return nullptr;
auto *IInst = dyn_cast<IntrinsicInst>(*CommonUser);
- if (!IInst || IInst->getIntrinsicID() !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (!IInst || IInst->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return nullptr;
if (CompositeNode *CN = identifyDotProduct(IInst))
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 430e47451fd49..775e1b95e7def 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8107,7 +8107,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I, Trunc);
return;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (!TLI.shouldExpandPartialReductionIntrinsic(cast<IntrinsicInst>(&I))) {
visitTargetIntrinsic(I, Intrinsic);
return;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 8d8120ac9ed90..0eb3bf961b67a 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1259,6 +1259,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
.StartsWith("reverse.", Intrinsic::vector_reverse)
.StartsWith("interleave2.", Intrinsic::vector_interleave2)
.StartsWith("deinterleave2.", Intrinsic::vector_deinterleave2)
+ .StartsWith("partial.reduce.add", Intrinsic::vector_partial_reduce_add)
.Default(Intrinsic::not_intrinsic);
if (ID != Intrinsic::not_intrinsic) {
const auto *FT = F->getFunctionType();
@@ -1269,8 +1270,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
Tys.push_back(FT->getReturnType());
if (ID != Intrinsic::vector_interleave2)
Tys.push_back(FT->getParamType(0));
- if (ID == Intrinsic::vector_insert)
- // Inserting overloads the inserted type.
+ if (ID == Intrinsic::vector_insert ||
+ ID == Intrinsic::vector_partial_reduce_add)
+ // Inserting overloads the inserted type.
Tys.push_back(FT->getParamType(1));
rename(F);
NewFn = Intrinsic::getOrInsertDeclaration(F->getParent(), ID, Tys);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index c06b60fd2d9a9..e9ee130dd5e91 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6530,7 +6530,7 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
}
break;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
VectorType *AccTy = cast<VectorType>(Call.getArgOperand(0)->getType());
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(1)->getType());
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d7c90bcb9723d..ac810fd2d04bf 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2184,8 +2184,7 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- assert(I->getIntrinsicID() ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ assert(I->getIntrinsicID() == Intrinsic::vector_partial_reduce_add &&
"Unexpected intrinsic!");
return true;
}
@@ -17380,8 +17379,7 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
return true;
if (match(SingleUser,
- m_Intrinsic<
- Intrinsic::experimental_vector_partial_reduce_add>(
+ m_Intrinsic<Intrinsic::vector_partial_reduce_add>(
m_Value(), m_Specific(I))))
return true;
return false;
@@ -22416,8 +22414,7 @@ SDValue tryLowerPartialReductionToDot(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
bool Scalable = N->getValueType(0).isScalableVector();
@@ -22511,8 +22508,7 @@ SDValue tryLowerPartialReductionToWideAdd(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
if (!Subtarget->hasSVE2() && !Subtarget->isStreamingSVEAvailable())
@@ -22577,7 +22573,7 @@ static SDValue performIntrinsicCombine(SDNode *N,
switch (IID) {
default:
break;
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (SDValue Dot = tryLowerPartialReductionToDot(N, Subtarget, DAG))
return Dot;
if (SDValue WideAdd = tryLowerPartialReductionToWideAdd(N, Subtarget, DAG))
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index aea27ba32d37e..64b9dc31f75b7 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -418,7 +418,7 @@ MVT WebAssemblyTargetLowering::getPointerMemTy(const DataLayout &DL,
bool WebAssemblyTargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
+ if (I->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return true;
EVT VT = EVT::getEVT(I->getType());
@@ -2117,8 +2117,7 @@ SDValue WebAssemblyTargetLowering::LowerVASTART(SDValue Op,
// extmul and adds.
SDValue performLowerPartialReduction(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN);
- if (N->getConstantOperandVal(0) !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (N->getConstantOperandVal(0) != Intrinsic::vector_partial_reduce_add)
return SDValue();
assert(N->getValueType(0) == MVT::v4i32 && "can only support v4i32");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 723363fba5724..7df84456c9b17 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -375,9 +375,9 @@ void VPPartialReductionRecipe::execute(VPTransformState &State) {
Type *RetTy = PhiVal->getType();
- CallInst *V = Builder.CreateIntrinsic(
- RetTy, Intrinsic::experimental_vector_partial_reduce_add,
- {PhiVal, BinOpVal}, nullptr, "partial.reduce");
+ CallInst *V =
+ Builder.CreateIntrinsic(RetTy, Intrinsic::vector_partial_reduce_add,
+ {PhiVal, BinOpVal}, nullptr, "partial.reduce");
State.set(this, V);
}
diff --git a/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
new file mode 100644
index 0000000000000..1277d5d933d7b
--- /dev/null
+++ b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -S < %s | FileCheck %s
+; RUN: llvm-as %s -o - | llvm-dis | FileCheck %s
+
+define <4 x i32> @partial_reduce_add_fixed(<16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_fixed
+; CHECK: %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+
+ %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+ ret <4 x i32> %res
+}
+
+
+define <vscale x 4 x i32> @partial_reduce_add_scalable(<vscale x 16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_scalable
+; CHECK: %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+
+ %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+ ret <vscale x 4 x i32> %res
+}
+
+declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+; CHECK-DAG: declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+
+declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
+; CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
index ae08cd9d5bfef..3e4a856aed2ec 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
@@ -15,7 +15,7 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $q3
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(<4 x s32>) = COPY $q4
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[COPY1]](<4 x s32>), [[COPY2]](<4 x s32>), [[COPY3]](<4 x s32>), [[COPY4]](<4 x s32>)
- ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
+ ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
; CHECK-NEXT: [[VECREDUCE_ADD:%[0-9]+]]:_(s32) = G_VECREDUCE_ADD [[INT]](<4 x s32>)
; CHECK-NEXT: $w0 = COPY [[VECREDUCE_ADD]](s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -25,7 +25,7 @@ body: |
%4:_(<4 x s32>) = COPY $q3
%5:_(<4 x s32>) = COPY $q4
%1:_(<16 x s32>) = G_CONCAT_VECTORS %2:_(<4 x s32>), %3:_(<4 x s32>), %4:_(<4 x s32>), %5:_(<4 x s32>)
- %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
+ %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
%7:_(s32) = G_VECREDUCE_ADD %6:_(<4 x s32>)
$w0 = COPY %7:_(s32)
RET_ReallyLR implicit $w0
diff --git a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
index 11cf4c31936d8..ebb2da9a3edd2 100644
--- a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
+++ b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
@@ -45,10 +45,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-SVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-SVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-SVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-SVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-SVE: [[MIDDLE_BLOCK]]:
; CHECK-SVE-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -71,10 +71,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-NOSVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-NOSVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-NOSVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-NOSVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-NOSVE: [[MIDDLE_BLOCK]]:
; CHECK-NOSVE-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -96,10 +96,10 @@ vector.body: ; preds = %vector.body, %entry
%b.real.ext = sext <vscale x 16 x i8> %b.real to <vscale x 16 x i32>
%b.imag.ext = sext <vscale x 16 x i8> %b.imag to <vscale x 16 x i32>
%real.mul = mul <vscale x 16 x i32> %b.real.ext, %a.real.ext
- %real.mul.reduced = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
+ %real.mul.reduced = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
%imag.mul = mul <vscale x 16 x i32> %b.imag.ext, %a.imag.ext
%imag.mul.neg = sub <vscale x 16 x i32> zeroinitializer, %imag.mul
- %partial.reduce.sub = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %real.mul.reduced, <vscale x 16 x i32> %imag.mul.neg)
+ %partial.reduce.sub = call <vscale x 4 x i32> @ll...
[truncated]
|
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesThe partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change. Patch is 254.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158637.diff 33 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..61c8415873092 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20562,7 +20562,7 @@ Note that it has the following implications:
- If ``%cnt`` is non-zero, the return value is non-zero as well.
- If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
-'``llvm.experimental.vector.partial.reduce.add.*``' Intrinsic
+'``llvm.vector.partial.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20571,15 +20571,15 @@ This is an overloaded intrinsic.
::
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
Overview:
"""""""""
-The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
+The '``llvm.vector.partial.reduce.add.*``' intrinsics reduce the
concatenation of the two vector arguments down to the number of elements of the
result vector type.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index a6f4e51e258ab..31d6cd676f674 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1324,7 +1324,7 @@ class TargetTransformInfo {
/// \return The cost of a partial reduction, which is a reduction from a
/// vector to another vector with fewer elements of larger size. They are
- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which
+ /// represented by the llvm.vector.partial.reduce.add intrinsic, which
/// takes an accumulator of type \p AccumType and a second vector operand to
/// be accumulated, whose element count is specified by \p VF. The type of
/// reduction is specified by \p Opcode. The second operand passed to the
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2ba8b29e775e0..46be271320fdd 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -480,7 +480,7 @@ class LLVM_ABI TargetLoweringBase {
return true;
}
- /// Return true if the @llvm.experimental.vector.partial.reduce.* intrinsic
+ /// Return true if the @llvm.vector.partial.reduce.* intrinsic
/// should be expanded using generic code in SelectionDAGBuilder.
virtual bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const {
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index fb9ea10ac9127..585371a6a4423 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2797,9 +2797,9 @@ foreach n = 2...8 in {
//===-------------- Intrinsics to perform partial reduction ---------------===//
-def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
+ [llvm_anyvector_ty, llvm_anyvector_ty],
+ [IntrNoMem]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
index 7d355e6e365d3..6c2a5a7da84d3 100644
--- a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
+++ b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
@@ -1022,8 +1022,7 @@ ComplexDeinterleavingGraph::identifyDotProduct(Value *V) {
CompositeNode *ANode = nullptr;
- const Intrinsic::ID PartialReduceInt =
- Intrinsic::experimental_vector_partial_reduce_add;
+ const Intrinsic::ID PartialReduceInt = Intrinsic::vector_partial_reduce_add;
Value *AReal = nullptr;
Value *AImag = nullptr;
@@ -1139,8 +1138,7 @@ ComplexDeinterleavingGraph::identifyPartialReduction(Value *R, Value *I) {
return nullptr;
auto *IInst = dyn_cast<IntrinsicInst>(*CommonUser);
- if (!IInst || IInst->getIntrinsicID() !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (!IInst || IInst->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return nullptr;
if (CompositeNode *CN = identifyDotProduct(IInst))
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 430e47451fd49..775e1b95e7def 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8107,7 +8107,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I, Trunc);
return;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (!TLI.shouldExpandPartialReductionIntrinsic(cast<IntrinsicInst>(&I))) {
visitTargetIntrinsic(I, Intrinsic);
return;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 8d8120ac9ed90..0eb3bf961b67a 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1259,6 +1259,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
.StartsWith("reverse.", Intrinsic::vector_reverse)
.StartsWith("interleave2.", Intrinsic::vector_interleave2)
.StartsWith("deinterleave2.", Intrinsic::vector_deinterleave2)
+ .StartsWith("partial.reduce.add", Intrinsic::vector_partial_reduce_add)
.Default(Intrinsic::not_intrinsic);
if (ID != Intrinsic::not_intrinsic) {
const auto *FT = F->getFunctionType();
@@ -1269,8 +1270,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
Tys.push_back(FT->getReturnType());
if (ID != Intrinsic::vector_interleave2)
Tys.push_back(FT->getParamType(0));
- if (ID == Intrinsic::vector_insert)
- // Inserting overloads the inserted type.
+ if (ID == Intrinsic::vector_insert ||
+ ID == Intrinsic::vector_partial_reduce_add)
+ // Inserting overloads the inserted type.
Tys.push_back(FT->getParamType(1));
rename(F);
NewFn = Intrinsic::getOrInsertDeclaration(F->getParent(), ID, Tys);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index c06b60fd2d9a9..e9ee130dd5e91 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6530,7 +6530,7 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
}
break;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
VectorType *AccTy = cast<VectorType>(Call.getArgOperand(0)->getType());
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(1)->getType());
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d7c90bcb9723d..ac810fd2d04bf 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2184,8 +2184,7 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- assert(I->getIntrinsicID() ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ assert(I->getIntrinsicID() == Intrinsic::vector_partial_reduce_add &&
"Unexpected intrinsic!");
return true;
}
@@ -17380,8 +17379,7 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
return true;
if (match(SingleUser,
- m_Intrinsic<
- Intrinsic::experimental_vector_partial_reduce_add>(
+ m_Intrinsic<Intrinsic::vector_partial_reduce_add>(
m_Value(), m_Specific(I))))
return true;
return false;
@@ -22416,8 +22414,7 @@ SDValue tryLowerPartialReductionToDot(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
bool Scalable = N->getValueType(0).isScalableVector();
@@ -22511,8 +22508,7 @@ SDValue tryLowerPartialReductionToWideAdd(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
if (!Subtarget->hasSVE2() && !Subtarget->isStreamingSVEAvailable())
@@ -22577,7 +22573,7 @@ static SDValue performIntrinsicCombine(SDNode *N,
switch (IID) {
default:
break;
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (SDValue Dot = tryLowerPartialReductionToDot(N, Subtarget, DAG))
return Dot;
if (SDValue WideAdd = tryLowerPartialReductionToWideAdd(N, Subtarget, DAG))
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index aea27ba32d37e..64b9dc31f75b7 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -418,7 +418,7 @@ MVT WebAssemblyTargetLowering::getPointerMemTy(const DataLayout &DL,
bool WebAssemblyTargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
+ if (I->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return true;
EVT VT = EVT::getEVT(I->getType());
@@ -2117,8 +2117,7 @@ SDValue WebAssemblyTargetLowering::LowerVASTART(SDValue Op,
// extmul and adds.
SDValue performLowerPartialReduction(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN);
- if (N->getConstantOperandVal(0) !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (N->getConstantOperandVal(0) != Intrinsic::vector_partial_reduce_add)
return SDValue();
assert(N->getValueType(0) == MVT::v4i32 && "can only support v4i32");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 723363fba5724..7df84456c9b17 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -375,9 +375,9 @@ void VPPartialReductionRecipe::execute(VPTransformState &State) {
Type *RetTy = PhiVal->getType();
- CallInst *V = Builder.CreateIntrinsic(
- RetTy, Intrinsic::experimental_vector_partial_reduce_add,
- {PhiVal, BinOpVal}, nullptr, "partial.reduce");
+ CallInst *V =
+ Builder.CreateIntrinsic(RetTy, Intrinsic::vector_partial_reduce_add,
+ {PhiVal, BinOpVal}, nullptr, "partial.reduce");
State.set(this, V);
}
diff --git a/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
new file mode 100644
index 0000000000000..1277d5d933d7b
--- /dev/null
+++ b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -S < %s | FileCheck %s
+; RUN: llvm-as %s -o - | llvm-dis | FileCheck %s
+
+define <4 x i32> @partial_reduce_add_fixed(<16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_fixed
+; CHECK: %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+
+ %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+ ret <4 x i32> %res
+}
+
+
+define <vscale x 4 x i32> @partial_reduce_add_scalable(<vscale x 16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_scalable
+; CHECK: %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+
+ %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+ ret <vscale x 4 x i32> %res
+}
+
+declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+; CHECK-DAG: declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+
+declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
+; CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
index ae08cd9d5bfef..3e4a856aed2ec 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
@@ -15,7 +15,7 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $q3
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(<4 x s32>) = COPY $q4
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[COPY1]](<4 x s32>), [[COPY2]](<4 x s32>), [[COPY3]](<4 x s32>), [[COPY4]](<4 x s32>)
- ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
+ ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
; CHECK-NEXT: [[VECREDUCE_ADD:%[0-9]+]]:_(s32) = G_VECREDUCE_ADD [[INT]](<4 x s32>)
; CHECK-NEXT: $w0 = COPY [[VECREDUCE_ADD]](s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -25,7 +25,7 @@ body: |
%4:_(<4 x s32>) = COPY $q3
%5:_(<4 x s32>) = COPY $q4
%1:_(<16 x s32>) = G_CONCAT_VECTORS %2:_(<4 x s32>), %3:_(<4 x s32>), %4:_(<4 x s32>), %5:_(<4 x s32>)
- %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
+ %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
%7:_(s32) = G_VECREDUCE_ADD %6:_(<4 x s32>)
$w0 = COPY %7:_(s32)
RET_ReallyLR implicit $w0
diff --git a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
index 11cf4c31936d8..ebb2da9a3edd2 100644
--- a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
+++ b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
@@ -45,10 +45,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-SVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-SVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-SVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-SVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-SVE: [[MIDDLE_BLOCK]]:
; CHECK-SVE-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -71,10 +71,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-NOSVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-NOSVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-NOSVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-NOSVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-NOSVE: [[MIDDLE_BLOCK]]:
; CHECK-NOSVE-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -96,10 +96,10 @@ vector.body: ; preds = %vector.body, %entry
%b.real.ext = sext <vscale x 16 x i8> %b.real to <vscale x 16 x i32>
%b.imag.ext = sext <vscale x 16 x i8> %b.imag to <vscale x 16 x i32>
%real.mul = mul <vscale x 16 x i32> %b.real.ext, %a.real.ext
- %real.mul.reduced = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
+ %real.mul.reduced = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
%imag.mul = mul <vscale x 16 x i32> %b.imag.ext, %a.imag.ext
%imag.mul.neg = sub <vscale x 16 x i32> zeroinitializer, %imag.mul
- %partial.reduce.sub = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %real.mul.reduced, <vscale x 16 x i32> %imag.mul.neg)
+ %partial.reduce.sub = call <vscale x 4 x i32> @ll...
[truncated]
|
@llvm/pr-subscribers-backend-webassembly Author: Sander de Smalen (sdesmalen-arm) ChangesThe partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change. Patch is 254.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158637.diff 33 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..61c8415873092 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20562,7 +20562,7 @@ Note that it has the following implications:
- If ``%cnt`` is non-zero, the return value is non-zero as well.
- If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
-'``llvm.experimental.vector.partial.reduce.add.*``' Intrinsic
+'``llvm.vector.partial.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20571,15 +20571,15 @@ This is an overloaded intrinsic.
::
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
- declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
- declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
Overview:
"""""""""
-The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
+The '``llvm.vector.partial.reduce.add.*``' intrinsics reduce the
concatenation of the two vector arguments down to the number of elements of the
result vector type.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index a6f4e51e258ab..31d6cd676f674 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1324,7 +1324,7 @@ class TargetTransformInfo {
/// \return The cost of a partial reduction, which is a reduction from a
/// vector to another vector with fewer elements of larger size. They are
- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which
+ /// represented by the llvm.vector.partial.reduce.add intrinsic, which
/// takes an accumulator of type \p AccumType and a second vector operand to
/// be accumulated, whose element count is specified by \p VF. The type of
/// reduction is specified by \p Opcode. The second operand passed to the
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2ba8b29e775e0..46be271320fdd 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -480,7 +480,7 @@ class LLVM_ABI TargetLoweringBase {
return true;
}
- /// Return true if the @llvm.experimental.vector.partial.reduce.* intrinsic
+ /// Return true if the @llvm.vector.partial.reduce.* intrinsic
/// should be expanded using generic code in SelectionDAGBuilder.
virtual bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const {
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index fb9ea10ac9127..585371a6a4423 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2797,9 +2797,9 @@ foreach n = 2...8 in {
//===-------------- Intrinsics to perform partial reduction ---------------===//
-def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
+ [llvm_anyvector_ty, llvm_anyvector_ty],
+ [IntrNoMem]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
index 7d355e6e365d3..6c2a5a7da84d3 100644
--- a/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
+++ b/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
@@ -1022,8 +1022,7 @@ ComplexDeinterleavingGraph::identifyDotProduct(Value *V) {
CompositeNode *ANode = nullptr;
- const Intrinsic::ID PartialReduceInt =
- Intrinsic::experimental_vector_partial_reduce_add;
+ const Intrinsic::ID PartialReduceInt = Intrinsic::vector_partial_reduce_add;
Value *AReal = nullptr;
Value *AImag = nullptr;
@@ -1139,8 +1138,7 @@ ComplexDeinterleavingGraph::identifyPartialReduction(Value *R, Value *I) {
return nullptr;
auto *IInst = dyn_cast<IntrinsicInst>(*CommonUser);
- if (!IInst || IInst->getIntrinsicID() !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (!IInst || IInst->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return nullptr;
if (CompositeNode *CN = identifyDotProduct(IInst))
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 430e47451fd49..775e1b95e7def 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8107,7 +8107,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I, Trunc);
return;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (!TLI.shouldExpandPartialReductionIntrinsic(cast<IntrinsicInst>(&I))) {
visitTargetIntrinsic(I, Intrinsic);
return;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 8d8120ac9ed90..0eb3bf961b67a 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1259,6 +1259,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
.StartsWith("reverse.", Intrinsic::vector_reverse)
.StartsWith("interleave2.", Intrinsic::vector_interleave2)
.StartsWith("deinterleave2.", Intrinsic::vector_deinterleave2)
+ .StartsWith("partial.reduce.add", Intrinsic::vector_partial_reduce_add)
.Default(Intrinsic::not_intrinsic);
if (ID != Intrinsic::not_intrinsic) {
const auto *FT = F->getFunctionType();
@@ -1269,8 +1270,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
Tys.push_back(FT->getReturnType());
if (ID != Intrinsic::vector_interleave2)
Tys.push_back(FT->getParamType(0));
- if (ID == Intrinsic::vector_insert)
- // Inserting overloads the inserted type.
+ if (ID == Intrinsic::vector_insert ||
+ ID == Intrinsic::vector_partial_reduce_add)
+ // Inserting overloads the inserted type.
Tys.push_back(FT->getParamType(1));
rename(F);
NewFn = Intrinsic::getOrInsertDeclaration(F->getParent(), ID, Tys);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index c06b60fd2d9a9..e9ee130dd5e91 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6530,7 +6530,7 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
}
break;
}
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
VectorType *AccTy = cast<VectorType>(Call.getArgOperand(0)->getType());
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(1)->getType());
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d7c90bcb9723d..ac810fd2d04bf 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2184,8 +2184,7 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- assert(I->getIntrinsicID() ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ assert(I->getIntrinsicID() == Intrinsic::vector_partial_reduce_add &&
"Unexpected intrinsic!");
return true;
}
@@ -17380,8 +17379,7 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
return true;
if (match(SingleUser,
- m_Intrinsic<
- Intrinsic::experimental_vector_partial_reduce_add>(
+ m_Intrinsic<Intrinsic::vector_partial_reduce_add>(
m_Value(), m_Specific(I))))
return true;
return false;
@@ -22416,8 +22414,7 @@ SDValue tryLowerPartialReductionToDot(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
bool Scalable = N->getValueType(0).isScalableVector();
@@ -22511,8 +22508,7 @@ SDValue tryLowerPartialReductionToWideAdd(SDNode *N,
SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
- getIntrinsicID(N) ==
- Intrinsic::experimental_vector_partial_reduce_add &&
+ getIntrinsicID(N) == Intrinsic::vector_partial_reduce_add &&
"Expected a partial reduction node");
if (!Subtarget->hasSVE2() && !Subtarget->isStreamingSVEAvailable())
@@ -22577,7 +22573,7 @@ static SDValue performIntrinsicCombine(SDNode *N,
switch (IID) {
default:
break;
- case Intrinsic::experimental_vector_partial_reduce_add: {
+ case Intrinsic::vector_partial_reduce_add: {
if (SDValue Dot = tryLowerPartialReductionToDot(N, Subtarget, DAG))
return Dot;
if (SDValue WideAdd = tryLowerPartialReductionToWideAdd(N, Subtarget, DAG))
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index aea27ba32d37e..64b9dc31f75b7 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -418,7 +418,7 @@ MVT WebAssemblyTargetLowering::getPointerMemTy(const DataLayout &DL,
bool WebAssemblyTargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
- if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
+ if (I->getIntrinsicID() != Intrinsic::vector_partial_reduce_add)
return true;
EVT VT = EVT::getEVT(I->getType());
@@ -2117,8 +2117,7 @@ SDValue WebAssemblyTargetLowering::LowerVASTART(SDValue Op,
// extmul and adds.
SDValue performLowerPartialReduction(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::INTRINSIC_WO_CHAIN);
- if (N->getConstantOperandVal(0) !=
- Intrinsic::experimental_vector_partial_reduce_add)
+ if (N->getConstantOperandVal(0) != Intrinsic::vector_partial_reduce_add)
return SDValue();
assert(N->getValueType(0) == MVT::v4i32 && "can only support v4i32");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 723363fba5724..7df84456c9b17 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -375,9 +375,9 @@ void VPPartialReductionRecipe::execute(VPTransformState &State) {
Type *RetTy = PhiVal->getType();
- CallInst *V = Builder.CreateIntrinsic(
- RetTy, Intrinsic::experimental_vector_partial_reduce_add,
- {PhiVal, BinOpVal}, nullptr, "partial.reduce");
+ CallInst *V =
+ Builder.CreateIntrinsic(RetTy, Intrinsic::vector_partial_reduce_add,
+ {PhiVal, BinOpVal}, nullptr, "partial.reduce");
State.set(this, V);
}
diff --git a/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
new file mode 100644
index 0000000000000..1277d5d933d7b
--- /dev/null
+++ b/llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -S < %s | FileCheck %s
+; RUN: llvm-as %s -o - | llvm-dis | FileCheck %s
+
+define <4 x i32> @partial_reduce_add_fixed(<16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_fixed
+; CHECK: %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+
+ %res = call <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32> zeroinitializer, <16 x i32> %a)
+ ret <4 x i32> %res
+}
+
+
+define <vscale x 4 x i32> @partial_reduce_add_scalable(<vscale x 16 x i32> %a) {
+; CHECK-LABEL: @partial_reduce_add_scalable
+; CHECK: %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+
+ %res = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i32> %a)
+ ret <vscale x 4 x i32> %res
+}
+
+declare <4 x i32> @llvm.experimental.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+; CHECK-DAG: declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v16i32(<4 x i32>, <16 x i32>)
+
+declare <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
+; CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32>, <vscale x 16 x i32>)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
index ae08cd9d5bfef..3e4a856aed2ec 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-addv.mir
@@ -15,7 +15,7 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $q3
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(<4 x s32>) = COPY $q4
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[COPY1]](<4 x s32>), [[COPY2]](<4 x s32>), [[COPY3]](<4 x s32>), [[COPY4]](<4 x s32>)
- ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
+ ; CHECK-NEXT: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), [[COPY]](<4 x s32>), [[CONCAT_VECTORS]](<16 x s32>)
; CHECK-NEXT: [[VECREDUCE_ADD:%[0-9]+]]:_(s32) = G_VECREDUCE_ADD [[INT]](<4 x s32>)
; CHECK-NEXT: $w0 = COPY [[VECREDUCE_ADD]](s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -25,7 +25,7 @@ body: |
%4:_(<4 x s32>) = COPY $q3
%5:_(<4 x s32>) = COPY $q4
%1:_(<16 x s32>) = G_CONCAT_VECTORS %2:_(<4 x s32>), %3:_(<4 x s32>), %4:_(<4 x s32>), %5:_(<4 x s32>)
- %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.experimental.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
+ %6:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.vector.partial.reduce.add), %0:_(<4 x s32>), %1:_(<16 x s32>)
%7:_(s32) = G_VECREDUCE_ADD %6:_(<4 x s32>)
$w0 = COPY %7:_(s32)
RET_ReallyLR implicit $w0
diff --git a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
index 11cf4c31936d8..ebb2da9a3edd2 100644
--- a/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
+++ b/llvm/test/CodeGen/AArch64/complex-deinterleaving-cdot.ll
@@ -45,10 +45,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-SVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-SVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-SVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-SVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-SVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-SVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-SVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-SVE: [[MIDDLE_BLOCK]]:
; CHECK-SVE-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -71,10 +71,10 @@ define i32 @cdotp_i8_rot0(<vscale x 32 x i8> %a, <vscale x 32 x i8> %b) {
; CHECK-NOSVE-NEXT: [[B_REAL_EXT:%.*]] = sext <vscale x 16 x i8> [[B_REAL]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[B_IMAG_EXT:%.*]] = sext <vscale x 16 x i8> [[B_IMAG]] to <vscale x 16 x i32>
; CHECK-NOSVE-NEXT: [[REAL_MUL:%.*]] = mul <vscale x 16 x i32> [[B_REAL_EXT]], [[A_REAL_EXT]]
-; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
+; CHECK-NOSVE-NEXT: [[REAL_MUL_REDUCED:%.*]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 16 x i32> [[REAL_MUL]])
; CHECK-NOSVE-NEXT: [[IMAG_MUL:%.*]] = mul <vscale x 16 x i32> [[B_IMAG_EXT]], [[A_IMAG_EXT]]
; CHECK-NOSVE-NEXT: [[IMAG_MUL_NEG:%.*]] = sub <vscale x 16 x i32> zeroinitializer, [[IMAG_MUL]]
-; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
+; CHECK-NOSVE-NEXT: [[PARTIAL_REDUCE_SUB]] = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> [[REAL_MUL_REDUCED]], <vscale x 16 x i32> [[IMAG_MUL_NEG]])
; CHECK-NOSVE-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]]
; CHECK-NOSVE: [[MIDDLE_BLOCK]]:
; CHECK-NOSVE-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[PARTIAL_REDUCE_SUB]])
@@ -96,10 +96,10 @@ vector.body: ; preds = %vector.body, %entry
%b.real.ext = sext <vscale x 16 x i8> %b.real to <vscale x 16 x i32>
%b.imag.ext = sext <vscale x 16 x i8> %b.imag to <vscale x 16 x i32>
%real.mul = mul <vscale x 16 x i32> %b.real.ext, %a.real.ext
- %real.mul.reduced = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
+ %real.mul.reduced = call <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %vec.phi, <vscale x 16 x i32> %real.mul)
%imag.mul = mul <vscale x 16 x i32> %b.imag.ext, %a.imag.ext
%imag.mul.neg = sub <vscale x 16 x i32> zeroinitializer, %imag.mul
- %partial.reduce.sub = call <vscale x 4 x i32> @llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv16i32(<vscale x 4 x i32> %real.mul.reduced, <vscale x 16 x i32> %imag.mul.neg)
+ %partial.reduce.sub = call <vscale x 4 x i32> @ll...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
Outdated
Show resolved
Hide resolved
llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
Outdated
Show resolved
Hide resolved
llvm/test/Bitcode/upgrade-vector-partial-reduce-add-intrinsic.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
The partial reduction intrinsics are not experimental, because they've been used in production for a while now and are unlikely to change.
1eecca8
to
b236fba
Compare
The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.