[RISCV][TTI] Implement instruction cost for vp_merge. #112327

ElvisWang123 · 2024-10-15T07:43:28Z

This patch implement the instruction for vp_merge, which will generate similar instruction sequence to the select instruction.

`vp_merge` will generate `vmerge.vvm` in RISCV.

llvmbot · 2024-10-15T07:44:02Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Elvis Wang (ElvisWang123)

Changes

This patch implement the instruction for vp_merge, which will generate similar instruction sequence to the select instruction.

Patch is 39.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112327.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+5-1)
(added) llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll (+240)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 8d18fd63e4a2e1..6147b977e857c9 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1132,6 +1132,10 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
     return getCmpSelInstrCost(*FOp, ICA.getReturnType(), ICA.getArgTypes()[0],
                               CmpInst::BAD_ICMP_PREDICATE, CostKind);
   }
+  case Intrinsic::vp_merge:
+    return getCmpSelInstrCost(Instruction::Select, ICA.getReturnType(),
+                              ICA.getArgTypes()[0], CmpInst::BAD_ICMP_PREDICATE,
+                              CostKind);
   }
 
   if (ST->hasVInstructions() && RetTy->isVectorTy()) {
@@ -2431,4 +2435,4 @@ bool RISCVTTIImpl::isProfitableToSinkOperands(
     Ops.push_back(&OpIdx.value());
   }
   return true;
-}
\ No newline at end of file
+}
diff --git a/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
new file mode 100644
index 00000000000000..57a81cc87fb6a3
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
@@ -0,0 +1,240 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh -riscv-v-fixed-length-vector-lmul-max=1 < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh -riscv-v-fixed-length-vector-lmul-max=1 \
+; RUN: --type-based-intrinsic-cost=true < %s | FileCheck %s --check-prefixes=TYPEBASED
+; Check that we don't crash querying costs when vectors are not enabled.
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=riscv64
+
+define void @vp_merge() {
+; CHECK-LABEL: 'vp_merge'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call <1 x i8> @llvm.vp.merge.v1i8(<1 x i1> undef, <1 x i8> undef, <1 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = call <2 x i8> @llvm.vp.merge.v2i8(<2 x i1> undef, <2 x i8> undef, <2 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = call <4 x i8> @llvm.vp.merge.v4i8(<4 x i1> undef, <4 x i8> undef, <4 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = call <8 x i8> @llvm.vp.merge.v8i8(<8 x i1> undef, <8 x i8> undef, <8 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = call <16 x i8> @llvm.vp.merge.v16i8(<16 x i1> undef, <16 x i8> undef, <16 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %6 = call <32 x i8> @llvm.vp.merge.v32i8(<32 x i1> undef, <32 x i8> undef, <32 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %7 = call <vscale x 1 x i8> @llvm.vp.merge.nxv1i8(<vscale x 1 x i1> undef, <vscale x 1 x i8> undef, <vscale x 1 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %8 = call <vscale x 2 x i8> @llvm.vp.merge.nxv2i8(<vscale x 2 x i1> undef, <vscale x 2 x i8> undef, <vscale x 2 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %9 = call <vscale x 4 x i8> @llvm.vp.merge.nxv4i8(<vscale x 4 x i1> undef, <vscale x 4 x i8> undef, <vscale x 4 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %10 = call <vscale x 8 x i8> @llvm.vp.merge.nxv8i8(<vscale x 8 x i1> undef, <vscale x 8 x i8> undef, <vscale x 8 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 16 x i8> @llvm.vp.merge.nxv16i8(<vscale x 16 x i1> undef, <vscale x 16 x i8> undef, <vscale x 16 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %12 = call <vscale x 32 x i8> @llvm.vp.merge.nxv32i8(<vscale x 32 x i1> undef, <vscale x 32 x i8> undef, <vscale x 32 x i8> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = call <1 x i16> @llvm.vp.merge.v1i16(<1 x i1> undef, <1 x i16> undef, <1 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = call <2 x i16> @llvm.vp.merge.v2i16(<2 x i1> undef, <2 x i16> undef, <2 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = call <4 x i16> @llvm.vp.merge.v4i16(<4 x i1> undef, <4 x i16> undef, <4 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = call <8 x i16> @llvm.vp.merge.v8i16(<8 x i1> undef, <8 x i16> undef, <8 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %17 = call <16 x i16> @llvm.vp.merge.v16i16(<16 x i1> undef, <16 x i16> undef, <16 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %18 = call <32 x i16> @llvm.vp.merge.v32i16(<32 x i1> undef, <32 x i16> undef, <32 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = call <vscale x 1 x i16> @llvm.vp.merge.nxv1i16(<vscale x 1 x i1> undef, <vscale x 1 x i16> undef, <vscale x 1 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = call <vscale x 2 x i16> @llvm.vp.merge.nxv2i16(<vscale x 2 x i1> undef, <vscale x 2 x i16> undef, <vscale x 2 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = call <vscale x 4 x i16> @llvm.vp.merge.nxv4i16(<vscale x 4 x i1> undef, <vscale x 4 x i16> undef, <vscale x 4 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 8 x i16> @llvm.vp.merge.nxv8i16(<vscale x 8 x i1> undef, <vscale x 8 x i16> undef, <vscale x 8 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %23 = call <vscale x 16 x i16> @llvm.vp.merge.nxv16i16(<vscale x 16 x i1> undef, <vscale x 16 x i16> undef, <vscale x 16 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %24 = call <vscale x 32 x i16> @llvm.vp.merge.nxv32i16(<vscale x 32 x i1> undef, <vscale x 32 x i16> undef, <vscale x 32 x i16> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %25 = call <1 x i32> @llvm.vp.merge.v1i32(<1 x i1> undef, <1 x i32> undef, <1 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %26 = call <2 x i32> @llvm.vp.merge.v2i32(<2 x i1> undef, <2 x i32> undef, <2 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %27 = call <4 x i32> @llvm.vp.merge.v4i32(<4 x i1> undef, <4 x i32> undef, <4 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %28 = call <8 x i32> @llvm.vp.merge.v8i32(<8 x i1> undef, <8 x i32> undef, <8 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %29 = call <16 x i32> @llvm.vp.merge.v16i32(<16 x i1> undef, <16 x i32> undef, <16 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %30 = call <32 x i32> @llvm.vp.merge.v32i32(<32 x i1> undef, <32 x i32> undef, <32 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %31 = call <vscale x 1 x i32> @llvm.vp.merge.nxv1i32(<vscale x 1 x i1> undef, <vscale x 1 x i32> undef, <vscale x 1 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %32 = call <vscale x 2 x i32> @llvm.vp.merge.nxv2i32(<vscale x 2 x i1> undef, <vscale x 2 x i32> undef, <vscale x 2 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %33 = call <vscale x 4 x i32> @llvm.vp.merge.nxv4i32(<vscale x 4 x i1> undef, <vscale x 4 x i32> undef, <vscale x 4 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %34 = call <vscale x 8 x i32> @llvm.vp.merge.nxv8i32(<vscale x 8 x i1> undef, <vscale x 8 x i32> undef, <vscale x 8 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %35 = call <vscale x 16 x i32> @llvm.vp.merge.nxv16i32(<vscale x 16 x i1> undef, <vscale x 16 x i32> undef, <vscale x 16 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %36 = call <vscale x 32 x i32> @llvm.vp.merge.nxv32i32(<vscale x 32 x i1> undef, <vscale x 32 x i32> undef, <vscale x 32 x i32> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %37 = call <1 x i64> @llvm.vp.merge.v1i64(<1 x i1> undef, <1 x i64> undef, <1 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %38 = call <2 x i64> @llvm.vp.merge.v2i64(<2 x i1> undef, <2 x i64> undef, <2 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %39 = call <4 x i64> @llvm.vp.merge.v4i64(<4 x i1> undef, <4 x i64> undef, <4 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %40 = call <8 x i64> @llvm.vp.merge.v8i64(<8 x i1> undef, <8 x i64> undef, <8 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %41 = call <16 x i64> @llvm.vp.merge.v16i64(<16 x i1> undef, <16 x i64> undef, <16 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %42 = call <32 x i64> @llvm.vp.merge.v32i64(<32 x i1> undef, <32 x i64> undef, <32 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %43 = call <vscale x 1 x i64> @llvm.vp.merge.nxv1i64(<vscale x 1 x i1> undef, <vscale x 1 x i64> undef, <vscale x 1 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> undef, <vscale x 2 x i64> undef, <vscale x 2 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %45 = call <vscale x 4 x i64> @llvm.vp.merge.nxv4i64(<vscale x 4 x i1> undef, <vscale x 4 x i64> undef, <vscale x 4 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %46 = call <vscale x 8 x i64> @llvm.vp.merge.nxv8i64(<vscale x 8 x i1> undef, <vscale x 8 x i64> undef, <vscale x 8 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %47 = call <vscale x 16 x i64> @llvm.vp.merge.nxv16i64(<vscale x 16 x i1> undef, <vscale x 16 x i64> undef, <vscale x 16 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %48 = call <vscale x 32 x i64> @llvm.vp.merge.nxv32i64(<vscale x 32 x i1> undef, <vscale x 32 x i64> undef, <vscale x 32 x i64> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %49 = call <1 x float> @llvm.vp.merge.v1f32(<1 x i1> undef, <1 x float> undef, <1 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %50 = call <2 x float> @llvm.vp.merge.v2f32(<2 x i1> undef, <2 x float> undef, <2 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %51 = call <4 x float> @llvm.vp.merge.v4f32(<4 x i1> undef, <4 x float> undef, <4 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %52 = call <8 x float> @llvm.vp.merge.v8f32(<8 x i1> undef, <8 x float> undef, <8 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %53 = call <16 x float> @llvm.vp.merge.v16f32(<16 x i1> undef, <16 x float> undef, <16 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %54 = call <32 x float> @llvm.vp.merge.v32f32(<32 x i1> undef, <32 x float> undef, <32 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %55 = call <vscale x 1 x float> @llvm.vp.merge.nxv1f32(<vscale x 1 x i1> undef, <vscale x 1 x float> undef, <vscale x 1 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %56 = call <vscale x 2 x float> @llvm.vp.merge.nxv2f32(<vscale x 2 x i1> undef, <vscale x 2 x float> undef, <vscale x 2 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %57 = call <vscale x 4 x float> @llvm.vp.merge.nxv4f32(<vscale x 4 x i1> undef, <vscale x 4 x float> undef, <vscale x 4 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %58 = call <vscale x 8 x float> @llvm.vp.merge.nxv8f32(<vscale x 8 x i1> undef, <vscale x 8 x float> undef, <vscale x 8 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %59 = call <vscale x 16 x float> @llvm.vp.merge.nxv16f32(<vscale x 16 x i1> undef, <vscale x 16 x float> undef, <vscale x 16 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %60 = call <vscale x 32 x float> @llvm.vp.merge.nxv32f32(<vscale x 32 x i1> undef, <vscale x 32 x float> undef, <vscale x 32 x float> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %61 = call <1 x double> @llvm.vp.merge.v1f64(<1 x i1> undef, <1 x double> undef, <1 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %62 = call <2 x double> @llvm.vp.merge.v2f64(<2 x i1> undef, <2 x double> undef, <2 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %63 = call <4 x double> @llvm.vp.merge.v4f64(<4 x i1> undef, <4 x double> undef, <4 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %64 = call <8 x double> @llvm.vp.merge.v8f64(<8 x i1> undef, <8 x double> undef, <8 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %65 = call <16 x double> @llvm.vp.merge.v16f64(<16 x i1> undef, <16 x double> undef, <16 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %66 = call <32 x double> @llvm.vp.merge.v32f64(<32 x i1> undef, <32 x double> undef, <32 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %67 = call <vscale x 1 x double> @llvm.vp.merge.nxv1f64(<vscale x 1 x i1> undef, <vscale x 1 x double> undef, <vscale x 1 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %68 = call <vscale x 2 x double> @llvm.vp.merge.nxv2f64(<vscale x 2 x i1> undef, <vscale x 2 x double> undef, <vscale x 2 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %69 = call <vscale x 4 x double> @llvm.vp.merge.nxv4f64(<vscale x 4 x i1> undef, <vscale x 4 x double> undef, <vscale x 4 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %70 = call <vscale x 8 x double> @llvm.vp.merge.nxv8f64(<vscale x 8 x i1> undef, <vscale x 8 x double> undef, <vscale x 8 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %71 = call <vscale x 16 x double> @llvm.vp.merge.nxv16f64(<vscale x 16 x i1> undef, <vscale x 16 x double> undef, <vscale x 16 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %72 = call <vscale x 32 x double> @llvm.vp.merge.nxv32f64(<vscale x 32 x i1> undef, <vscale x 32 x double> undef, <vscale x 32 x double> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; TYPEBASED-LABEL: 'vp_merge'
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call <1 x i8> @llvm.vp.merge.v1i8(<1 x i1> undef, <1 x i8> undef, <1 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = call <2 x i8> @llvm.vp.merge.v2i8(<2 x i1> undef, <2 x i8> undef, <2 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = call <4 x i8> @llvm.vp.merge.v4i8(<4 x i1> undef, <4 x i8> undef, <4 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = call <8 x i8> @llvm.vp.merge.v8i8(<8 x i1> undef, <8 x i8> undef, <8 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = call <16 x i8> @llvm.vp.merge.v16i8(<16 x i1> undef, <16 x i8> undef, <16 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %6 = call <32 x i8> @llvm.vp.merge.v32i8(<32 x i1> undef, <32 x i8> undef, <32 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %7 = call <vscale x 1 x i8> @llvm.vp.merge.nxv1i8(<vscale x 1 x i1> undef, <vscale x 1 x i8> undef, <vscale x 1 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %8 = call <vscale x 2 x i8> @llvm.vp.merge.nxv2i8(<vscale x 2 x i1> undef, <vscale x 2 x i8> undef, <vscale x 2 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %9 = call <vscale x 4 x i8> @llvm.vp.merge.nxv4i8(<vscale x 4 x i1> undef, <vscale x 4 x i8> undef, <vscale x 4 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %10 = call <vscale x 8 x i8> @llvm.vp.merge.nxv8i8(<vscale x 8 x i1> undef, <vscale x 8 x i8> undef, <vscale x 8 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 16 x i8> @llvm.vp.merge.nxv16i8(<vscale x 16 x i1> undef, <vscale x 16 x i8> undef, <vscale x 16 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %12 = call <vscale x 32 x i8> @llvm.vp.merge.nxv32i8(<vscale x 32 x i1> undef, <vscale x 32 x i8> undef, <vscale x 32 x i8> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = call <1 x i16> @llvm.vp.merge.v1i16(<1 x i1> undef, <1 x i16> undef, <1 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = call <2 x i16> @llvm.vp.merge.v2i16(<2 x i1> undef, <2 x i16> undef, <2 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = call <4 x i16> @llvm.vp.merge.v4i16(<4 x i1> undef, <4 x i16> undef, <4 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = call <8 x i16> @llvm.vp.merge.v8i16(<8 x i1> undef, <8 x i16> undef, <8 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %17 = call <16 x i16> @llvm.vp.merge.v16i16(<16 x i1> undef, <16 x i16> undef, <16 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %18 = call <32 x i16> @llvm.vp.merge.v32i16(<32 x i1> undef, <32 x i16> undef, <32 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = call <vscale x 1 x i16> @llvm.vp.merge.nxv1i16(<vscale x 1 x i1> undef, <vscale x 1 x i16> undef, <vscale x 1 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = call <vscale x 2 x i16> @llvm.vp.merge.nxv2i16(<vscale x 2 x i1> undef, <vscale x 2 x i16> undef, <vscale x 2 x i16> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = call <vscale x 4 x i16> @llvm.vp.merge.nxv4i16(<vscal...
[truncated]

llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll

Also add i1 testcases.

ElvisWang123 added 3 commits October 14, 2024 20:11

Pre-commit testcase

bc06f64

[RISCV][TTI] Implment instruction cost for vp_merge.

f834ca4

`vp_merge` will generate `vmerge.vvm` in RISCV.

Fixup, reuse same function as select instruction

358f934

ElvisWang123 requested review from Mel-Chen, arcbbb, lukel97 and topperc October 15, 2024 07:43

llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Oct 15, 2024

lukel97 approved these changes Oct 15, 2024

View reviewed changes

llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll Outdated Show resolved Hide resolved

llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll Outdated Show resolved Hide resolved

Fixup! merge tests into rvv-select and fix args in vp intrinsics.

9adc9ef

Also add i1 testcases.

ElvisWang123 merged commit 566012a into llvm:main Oct 16, 2024
8 checks passed

ElvisWang123 deleted the add-vp-merge-cost branch October 16, 2024 23:47

ElvisWang123 mentioned this pull request Oct 16, 2024

[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. #101641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV][TTI] Implement instruction cost for vp_merge. #112327

[RISCV][TTI] Implement instruction cost for vp_merge. #112327

Uh oh!

ElvisWang123 commented Oct 15, 2024

Uh oh!

llvmbot commented Oct 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[RISCV][TTI] Implement instruction cost for vp_merge. #112327

[RISCV][TTI] Implement instruction cost for vp_merge. #112327

Uh oh!

Conversation

ElvisWang123 commented Oct 15, 2024

Uh oh!

llvmbot commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Oct 15, 2024 •

edited

Loading