-
Notifications
You must be signed in to change notification settings - Fork 15.2k
X86: Improve cost model of fp16 conversion #113195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - `setOperationAction` of v4f16, v8f16 and v16f16 to Custom so `TargetTransformInfo::getStoreMinimumVF` reports them as acceptable. - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.
|
@llvm/pr-subscribers-llvm-transforms Author: Matthias Braun (MatzeB) ChangesImprove cost-modeling for x86 __fp16 conversions so the SLPVectorizer
Patch is 35.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113195.diff 3 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index bcb84add65d83e..da88a1a0a5a3b8 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -1714,6 +1714,9 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationPromotedToType(Opc, MVT::v8f16, MVT::v8f32);
setOperationPromotedToType(Opc, MVT::v16f16, MVT::v16f32);
}
+ // trunc+store via vcvtps2ph
+ setOperationAction(ISD::STORE, MVT::v4f16, Custom);
+ setOperationAction(ISD::STORE, MVT::v8f16, Custom);
}
// This block controls legalization of the mask vector sizes that are
@@ -1784,6 +1787,9 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
for (auto VT : { MVT::v1i1, MVT::v2i1, MVT::v4i1, MVT::v8i1 })
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
+
+ // trunc+store via vcvtps2ph
+ setOperationAction(ISD::STORE, MVT::v16f16, Custom);
}
if (Subtarget.hasDQI() && Subtarget.hasVLX()) {
for (MVT VT : {MVT::v4f32, MVT::v8f32, MVT::v2f64, MVT::v4f64}) {
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 413ef0136d5c06..2d2c804ed46e54 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -2296,7 +2296,10 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, { 1, 1, 1, 1 } },
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, { 3, 1, 1, 1 } },
{ ISD::FP_EXTEND, MVT::v16f64, MVT::v16f32, { 4, 1, 1, 1 } }, // 2*vcvtps2pd+vextractf64x4
+ { ISD::FP_EXTEND, MVT::v16f32, MVT::v16f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v8f64, MVT::v8f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, { 1, 1, 1, 1 } },
+ { ISD::FP_ROUND, MVT::v16f16, MVT::v16f32, { 1, 1, 1, 1 } }, // vcvtps2ph
{ ISD::TRUNCATE, MVT::v2i1, MVT::v2i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
{ ISD::TRUNCATE, MVT::v4i1, MVT::v4i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
@@ -2973,6 +2976,14 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::TRUNCATE, MVT::v4i32, MVT::v2i64, { 1, 1, 1, 1 } }, // PSHUFD
};
+ static const TypeConversionCostKindTblEntry F16ConversionTbl[] = {
+ { ISD::FP_ROUND, MVT::v8f16, MVT::v8f32, { 1, 1, 1, 1 } }, // vcvtps2ph
+ { ISD::FP_ROUND, MVT::v4f16, MVT::v4f32, { 1, 1, 1, 1 } }, // vcvtps2ph
+ { ISD::FP_EXTEND, MVT::v8f32, MVT::v8f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v4f32, MVT::v4f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v4f64, MVT::v4f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
+ };
+
// Attempt to map directly to (simple) MVT types to let us match custom entries.
EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);
@@ -3034,6 +3045,13 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
return *KindCost;
}
+ if (ST->hasF16C()) {
+ if (const auto *Entry = ConvertCostTableLookup(F16ConversionTbl, ISD,
+ SimpleDstTy, SimpleSrcTy))
+ if (auto KindCost = Entry->Cost[CostKind])
+ return *KindCost;
+ }
+
if (ST->hasSSE41()) {
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
SimpleDstTy, SimpleSrcTy))
@@ -3107,6 +3125,13 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
if (auto KindCost = Entry->Cost[CostKind])
return std::max(LTSrc.first, LTDest.first) * *KindCost;
+ if (ST->hasF16C()) {
+ if (const auto *Entry = ConvertCostTableLookup(F16ConversionTbl, ISD,
+ LTDest.second, LTSrc.second))
+ if (auto KindCost = Entry->Cost[CostKind])
+ return std::max(LTSrc.first, LTDest.first) * *KindCost;
+ }
+
if (ST->hasSSE41())
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
LTDest.second, LTSrc.second))
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
new file mode 100644
index 00000000000000..1d5dee6cb8121c
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
@@ -0,0 +1,601 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx2 | FileCheck %s --check-prefix=CHECK
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx2 -mattr=+f16c | FileCheck %s --check-prefix=CHECK-F16C
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx512f | FileCheck %s --check-prefix=CHECK-AVX512
+
+define void @fpext_v4xf16_v4xf32(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to float
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to float
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to float
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to float
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D3:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 3
+; CHECK-NEXT: store float [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store float [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store float [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store float [[E3]], ptr [[D3]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x float>
+; CHECK-F16C-NEXT: store <4 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x float>
+; CHECK-AVX512-NEXT: store <4 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+
+ %e0 = fpext half %l0 to float
+ %e1 = fpext half %l1 to float
+ %e2 = fpext half %l2 to float
+ %e3 = fpext half %l3 to float
+
+ %d1 = getelementptr inbounds float, ptr %d0, i64 1
+ %d2 = getelementptr inbounds float, ptr %d0, i64 2
+ %d3 = getelementptr inbounds float, ptr %d0, i64 3
+ store float %e0, ptr %d0, align 8
+ store float %e1, ptr %d1, align 8
+ store float %e2, ptr %d2, align 8
+ store float %e3, ptr %d3, align 8
+ ret void
+}
+
+define void @fpext_v4xf16_v4xf64(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to double
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to double
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to double
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to double
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D3:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 3
+; CHECK-NEXT: store double [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store double [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store double [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store double [[E3]], ptr [[D3]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x double>
+; CHECK-F16C-NEXT: store <4 x double> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x double>
+; CHECK-AVX512-NEXT: store <4 x double> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+
+ %e0 = fpext half %l0 to double
+ %e1 = fpext half %l1 to double
+ %e2 = fpext half %l2 to double
+ %e3 = fpext half %l3 to double
+
+ %d1 = getelementptr inbounds double, ptr %d0, i64 1
+ %d2 = getelementptr inbounds double, ptr %d0, i64 2
+ %d3 = getelementptr inbounds double, ptr %d0, i64 3
+ store double %e0, ptr %d0, align 8
+ store double %e1, ptr %d1, align 8
+ store double %e2, ptr %d2, align 8
+ store double %e3, ptr %d3, align 8
+ ret void
+}
+
+define void @fpext_v16xf15_v16xf32(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[S4:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 4
+; CHECK-NEXT: [[S5:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 5
+; CHECK-NEXT: [[S6:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 6
+; CHECK-NEXT: [[S7:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 7
+; CHECK-NEXT: [[S8:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 8
+; CHECK-NEXT: [[S9:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 9
+; CHECK-NEXT: [[S10:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 10
+; CHECK-NEXT: [[S11:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 11
+; CHECK-NEXT: [[S12:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 12
+; CHECK-NEXT: [[S13:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 13
+; CHECK-NEXT: [[S14:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 14
+; CHECK-NEXT: [[S15:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 15
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[L4:%.*]] = load half, ptr [[S4]], align 2
+; CHECK-NEXT: [[L5:%.*]] = load half, ptr [[S5]], align 2
+; CHECK-NEXT: [[L6:%.*]] = load half, ptr [[S6]], align 2
+; CHECK-NEXT: [[L7:%.*]] = load half, ptr [[S7]], align 2
+; CHECK-NEXT: [[L8:%.*]] = load half, ptr [[S8]], align 2
+; CHECK-NEXT: [[L9:%.*]] = load half, ptr [[S9]], align 2
+; CHECK-NEXT: [[L10:%.*]] = load half, ptr [[S10]], align 2
+; CHECK-NEXT: [[L11:%.*]] = load half, ptr [[S11]], align 2
+; CHECK-NEXT: [[L12:%.*]] = load half, ptr [[S12]], align 2
+; CHECK-NEXT: [[L13:%.*]] = load half, ptr [[S13]], align 2
+; CHECK-NEXT: [[L14:%.*]] = load half, ptr [[S14]], align 2
+; CHECK-NEXT: [[L15:%.*]] = load half, ptr [[S15]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to float
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to float
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to float
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to float
+; CHECK-NEXT: [[E4:%.*]] = fpext half [[L4]] to float
+; CHECK-NEXT: [[E5:%.*]] = fpext half [[L5]] to float
+; CHECK-NEXT: [[E6:%.*]] = fpext half [[L6]] to float
+; CHECK-NEXT: [[E7:%.*]] = fpext half [[L7]] to float
+; CHECK-NEXT: [[E8:%.*]] = fpext half [[L8]] to float
+; CHECK-NEXT: [[E9:%.*]] = fpext half [[L9]] to float
+; CHECK-NEXT: [[E10:%.*]] = fpext half [[L10]] to float
+; CHECK-NEXT: [[E11:%.*]] = fpext half [[L11]] to float
+; CHECK-NEXT: [[E12:%.*]] = fpext half [[L12]] to float
+; CHECK-NEXT: [[E13:%.*]] = fpext half [[L13]] to float
+; CHECK-NEXT: [[E14:%.*]] = fpext half [[L14]] to float
+; CHECK-NEXT: [[E15:%.*]] = fpext half [[L15]] to float
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D15:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 3
+; CHECK-NEXT: [[D4:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 4
+; CHECK-NEXT: [[D5:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 5
+; CHECK-NEXT: [[D6:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 6
+; CHECK-NEXT: [[D7:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 7
+; CHECK-NEXT: [[D8:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 8
+; CHECK-NEXT: [[D9:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 9
+; CHECK-NEXT: [[D10:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 10
+; CHECK-NEXT: [[D11:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 11
+; CHECK-NEXT: [[D12:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 12
+; CHECK-NEXT: [[D13:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 13
+; CHECK-NEXT: [[D14:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 14
+; CHECK-NEXT: [[D16:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 15
+; CHECK-NEXT: store float [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store float [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store float [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store float [[E3]], ptr [[D15]], align 8
+; CHECK-NEXT: store float [[E4]], ptr [[D4]], align 8
+; CHECK-NEXT: store float [[E5]], ptr [[D5]], align 8
+; CHECK-NEXT: store float [[E6]], ptr [[D6]], align 8
+; CHECK-NEXT: store float [[E7]], ptr [[D7]], align 8
+; CHECK-NEXT: store float [[E8]], ptr [[D8]], align 8
+; CHECK-NEXT: store float [[E9]], ptr [[D9]], align 8
+; CHECK-NEXT: store float [[E10]], ptr [[D10]], align 8
+; CHECK-NEXT: store float [[E11]], ptr [[D11]], align 8
+; CHECK-NEXT: store float [[E12]], ptr [[D12]], align 8
+; CHECK-NEXT: store float [[E13]], ptr [[D13]], align 8
+; CHECK-NEXT: store float [[E14]], ptr [[D14]], align 8
+; CHECK-NEXT: store float [[E15]], ptr [[D16]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-F16C-NEXT: [[S8:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 8
+; CHECK-F16C-NEXT: [[D8:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 8
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <8 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <8 x half> [[TMP1]] to <8 x float>
+; CHECK-F16C-NEXT: [[TMP3:%.*]] = load <8 x half>, ptr [[S8]], align 2
+; CHECK-F16C-NEXT: [[TMP4:%.*]] = fpext <8 x half> [[TMP3]] to <8 x float>
+; CHECK-F16C-NEXT: store <8 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: store <8 x float> [[TMP4]], ptr [[D8]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <16 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <16 x half> [[TMP1]] to <16 x float>
+; CHECK-AVX512-NEXT: store <16 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %s4 = getelementptr inbounds half, ptr %s0, i64 4
+ %s5 = getelementptr inbounds half, ptr %s0, i64 5
+ %s6 = getelementptr inbounds half, ptr %s0, i64 6
+ %s7 = getelementptr inbounds half, ptr %s0, i64 7
+ %s8 = getelementptr inbounds half, ptr %s0, i64 8
+ %s9 = getelementptr inbounds half, ptr %s0, i64 9
+ %s10 = getelementptr inbounds half, ptr %s0, i64 10
+ %s11 = getelementptr inbounds half, ptr %s0, i64 11
+ %s12 = getelementptr inbounds half, ptr %s0, i64 12
+ %s13 = getelementptr inbounds half, ptr %s0, i64 13
+ %s14 = getelementptr inbounds half, ptr %s0, i64 14
+ %s15 = getelementptr inbounds half, ptr %s0, i64 15
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+ %l4 = load half, ptr %s4, align 2
+ %l5 = load half, ptr %s5, align 2
+ %l6 = load half, ptr %s6, align 2
+ %l7 = load half, ptr %s7, align 2
+ %l8 = load half, ptr %s8, align 2
+ %l9 = load half, ptr %s9, align 2
+ %l10 = load half, ptr %s10, align 2
+ %l11 = load half, ptr %s11, align 2
+ %l12 = load half, ptr %s12, align 2
+ %l13 = load half, ptr %s13, align 2
+ %l14 = load half, ptr %s14, align 2
+ %l15 = load half, ptr %s15, align 2
+
+ %e0 = fpext half %l0 to float
+ %e1 = fpext half %l1 to float
+ %e2 = fpext half %l2 to float
+ %e3 = fpext half %l3 to float
+ %e4 = fpext half %l4 to float
+ %e5 = fpext half %l5 to float
+ %e6 = fpext half %l6 to float
+ %e7 = fpext half %l7 to float
+ %e8 = fpext half %l8 to float
+ %e9 = fpext half %l9 to float
+ %e10 = fpext half %l10 to float
+ %e11 = fpext half %l11 to float
+ %e12 = fpext half %l12 to float
+ %e13 = fpext half %l13 to float
+ %e14 = fpext half %l14 to float
+ %e15 = fpext half %l15 to float
+
+ %d1 = getelementptr inbounds float, ptr %d0, i64 1
+ %d2 = getelementptr inbounds float, ptr %d0, i64 2
+ %d3 = getelementptr inbounds float, ptr %d0, i64 3
+ %d4 = getelementptr inbounds float, ptr %d0, i64 4
+ %d5 = getelementptr inbounds float, ptr %d0, i64 5
+ %d6 = getelementptr inbounds float, ptr %d0, i64 6
+ %d7 = getelementptr inbounds float, ptr %d0, i64 7
+ %d8 = getelementptr inbounds float, ptr %d0, i64 8
+ %d9 = getelementptr inbounds float, ptr %d0, i64 9
+ %d10 = getelementptr inbounds float, ptr %d0, i64 10
+ %d11 = getelementptr inbounds float, ptr %d0, i64 11
+ %d12 = getelementptr inbounds float, ptr %d0, i64 12
+ %d13 = getelementptr inbounds float, ptr %d0, i64 13
+ %d14 = getelementptr inbounds float, ptr %d0, i64 14
+ %d15 = getelementptr inbounds float, ptr %d0, i64 15
+ store float %e0, ptr %d0, align 8
+ store float %e1, ptr %d1, ali...
[truncated]
|
|
@llvm/pr-subscribers-backend-x86 Author: Matthias Braun (MatzeB) ChangesImprove cost-modeling for x86 __fp16 conversions so the SLPVectorizer
Patch is 35.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113195.diff 3 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index bcb84add65d83e..da88a1a0a5a3b8 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -1714,6 +1714,9 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationPromotedToType(Opc, MVT::v8f16, MVT::v8f32);
setOperationPromotedToType(Opc, MVT::v16f16, MVT::v16f32);
}
+ // trunc+store via vcvtps2ph
+ setOperationAction(ISD::STORE, MVT::v4f16, Custom);
+ setOperationAction(ISD::STORE, MVT::v8f16, Custom);
}
// This block controls legalization of the mask vector sizes that are
@@ -1784,6 +1787,9 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
for (auto VT : { MVT::v1i1, MVT::v2i1, MVT::v4i1, MVT::v8i1 })
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
+
+ // trunc+store via vcvtps2ph
+ setOperationAction(ISD::STORE, MVT::v16f16, Custom);
}
if (Subtarget.hasDQI() && Subtarget.hasVLX()) {
for (MVT VT : {MVT::v4f32, MVT::v8f32, MVT::v2f64, MVT::v4f64}) {
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 413ef0136d5c06..2d2c804ed46e54 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -2296,7 +2296,10 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, { 1, 1, 1, 1 } },
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, { 3, 1, 1, 1 } },
{ ISD::FP_EXTEND, MVT::v16f64, MVT::v16f32, { 4, 1, 1, 1 } }, // 2*vcvtps2pd+vextractf64x4
+ { ISD::FP_EXTEND, MVT::v16f32, MVT::v16f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v8f64, MVT::v8f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, { 1, 1, 1, 1 } },
+ { ISD::FP_ROUND, MVT::v16f16, MVT::v16f32, { 1, 1, 1, 1 } }, // vcvtps2ph
{ ISD::TRUNCATE, MVT::v2i1, MVT::v2i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
{ ISD::TRUNCATE, MVT::v4i1, MVT::v4i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
@@ -2973,6 +2976,14 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::TRUNCATE, MVT::v4i32, MVT::v2i64, { 1, 1, 1, 1 } }, // PSHUFD
};
+ static const TypeConversionCostKindTblEntry F16ConversionTbl[] = {
+ { ISD::FP_ROUND, MVT::v8f16, MVT::v8f32, { 1, 1, 1, 1 } }, // vcvtps2ph
+ { ISD::FP_ROUND, MVT::v4f16, MVT::v4f32, { 1, 1, 1, 1 } }, // vcvtps2ph
+ { ISD::FP_EXTEND, MVT::v8f32, MVT::v8f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v4f32, MVT::v4f16, { 1, 1, 1, 1 } }, // vcvtph2ps
+ { ISD::FP_EXTEND, MVT::v4f64, MVT::v4f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
+ };
+
// Attempt to map directly to (simple) MVT types to let us match custom entries.
EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);
@@ -3034,6 +3045,13 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
return *KindCost;
}
+ if (ST->hasF16C()) {
+ if (const auto *Entry = ConvertCostTableLookup(F16ConversionTbl, ISD,
+ SimpleDstTy, SimpleSrcTy))
+ if (auto KindCost = Entry->Cost[CostKind])
+ return *KindCost;
+ }
+
if (ST->hasSSE41()) {
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
SimpleDstTy, SimpleSrcTy))
@@ -3107,6 +3125,13 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
if (auto KindCost = Entry->Cost[CostKind])
return std::max(LTSrc.first, LTDest.first) * *KindCost;
+ if (ST->hasF16C()) {
+ if (const auto *Entry = ConvertCostTableLookup(F16ConversionTbl, ISD,
+ LTDest.second, LTSrc.second))
+ if (auto KindCost = Entry->Cost[CostKind])
+ return std::max(LTSrc.first, LTDest.first) * *KindCost;
+ }
+
if (ST->hasSSE41())
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
LTDest.second, LTSrc.second))
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
new file mode 100644
index 00000000000000..1d5dee6cb8121c
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
@@ -0,0 +1,601 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx2 | FileCheck %s --check-prefix=CHECK
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx2 -mattr=+f16c | FileCheck %s --check-prefix=CHECK-F16C
+; RUN: opt < %s -mtriple=x86_64-- -passes=slp-vectorizer -S -mattr=+avx512f | FileCheck %s --check-prefix=CHECK-AVX512
+
+define void @fpext_v4xf16_v4xf32(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to float
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to float
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to float
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to float
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D3:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 3
+; CHECK-NEXT: store float [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store float [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store float [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store float [[E3]], ptr [[D3]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x float>
+; CHECK-F16C-NEXT: store <4 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v4xf16_v4xf32(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x float>
+; CHECK-AVX512-NEXT: store <4 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+
+ %e0 = fpext half %l0 to float
+ %e1 = fpext half %l1 to float
+ %e2 = fpext half %l2 to float
+ %e3 = fpext half %l3 to float
+
+ %d1 = getelementptr inbounds float, ptr %d0, i64 1
+ %d2 = getelementptr inbounds float, ptr %d0, i64 2
+ %d3 = getelementptr inbounds float, ptr %d0, i64 3
+ store float %e0, ptr %d0, align 8
+ store float %e1, ptr %d1, align 8
+ store float %e2, ptr %d2, align 8
+ store float %e3, ptr %d3, align 8
+ ret void
+}
+
+define void @fpext_v4xf16_v4xf64(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to double
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to double
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to double
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to double
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D3:%.*]] = getelementptr inbounds double, ptr [[D0]], i64 3
+; CHECK-NEXT: store double [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store double [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store double [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store double [[E3]], ptr [[D3]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x double>
+; CHECK-F16C-NEXT: store <4 x double> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v4xf16_v4xf64(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <4 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <4 x half> [[TMP1]] to <4 x double>
+; CHECK-AVX512-NEXT: store <4 x double> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+
+ %e0 = fpext half %l0 to double
+ %e1 = fpext half %l1 to double
+ %e2 = fpext half %l2 to double
+ %e3 = fpext half %l3 to double
+
+ %d1 = getelementptr inbounds double, ptr %d0, i64 1
+ %d2 = getelementptr inbounds double, ptr %d0, i64 2
+ %d3 = getelementptr inbounds double, ptr %d0, i64 3
+ store double %e0, ptr %d0, align 8
+ store double %e1, ptr %d1, align 8
+ store double %e2, ptr %d2, align 8
+ store double %e3, ptr %d3, align 8
+ ret void
+}
+
+define void @fpext_v16xf15_v16xf32(ptr %s0, ptr %d0) {
+; CHECK-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[S1:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 1
+; CHECK-NEXT: [[S2:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 2
+; CHECK-NEXT: [[S3:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 3
+; CHECK-NEXT: [[S4:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 4
+; CHECK-NEXT: [[S5:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 5
+; CHECK-NEXT: [[S6:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 6
+; CHECK-NEXT: [[S7:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 7
+; CHECK-NEXT: [[S8:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 8
+; CHECK-NEXT: [[S9:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 9
+; CHECK-NEXT: [[S10:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 10
+; CHECK-NEXT: [[S11:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 11
+; CHECK-NEXT: [[S12:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 12
+; CHECK-NEXT: [[S13:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 13
+; CHECK-NEXT: [[S14:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 14
+; CHECK-NEXT: [[S15:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 15
+; CHECK-NEXT: [[L0:%.*]] = load half, ptr [[S0]], align 2
+; CHECK-NEXT: [[L1:%.*]] = load half, ptr [[S1]], align 2
+; CHECK-NEXT: [[L2:%.*]] = load half, ptr [[S2]], align 2
+; CHECK-NEXT: [[L3:%.*]] = load half, ptr [[S3]], align 2
+; CHECK-NEXT: [[L4:%.*]] = load half, ptr [[S4]], align 2
+; CHECK-NEXT: [[L5:%.*]] = load half, ptr [[S5]], align 2
+; CHECK-NEXT: [[L6:%.*]] = load half, ptr [[S6]], align 2
+; CHECK-NEXT: [[L7:%.*]] = load half, ptr [[S7]], align 2
+; CHECK-NEXT: [[L8:%.*]] = load half, ptr [[S8]], align 2
+; CHECK-NEXT: [[L9:%.*]] = load half, ptr [[S9]], align 2
+; CHECK-NEXT: [[L10:%.*]] = load half, ptr [[S10]], align 2
+; CHECK-NEXT: [[L11:%.*]] = load half, ptr [[S11]], align 2
+; CHECK-NEXT: [[L12:%.*]] = load half, ptr [[S12]], align 2
+; CHECK-NEXT: [[L13:%.*]] = load half, ptr [[S13]], align 2
+; CHECK-NEXT: [[L14:%.*]] = load half, ptr [[S14]], align 2
+; CHECK-NEXT: [[L15:%.*]] = load half, ptr [[S15]], align 2
+; CHECK-NEXT: [[E0:%.*]] = fpext half [[L0]] to float
+; CHECK-NEXT: [[E1:%.*]] = fpext half [[L1]] to float
+; CHECK-NEXT: [[E2:%.*]] = fpext half [[L2]] to float
+; CHECK-NEXT: [[E3:%.*]] = fpext half [[L3]] to float
+; CHECK-NEXT: [[E4:%.*]] = fpext half [[L4]] to float
+; CHECK-NEXT: [[E5:%.*]] = fpext half [[L5]] to float
+; CHECK-NEXT: [[E6:%.*]] = fpext half [[L6]] to float
+; CHECK-NEXT: [[E7:%.*]] = fpext half [[L7]] to float
+; CHECK-NEXT: [[E8:%.*]] = fpext half [[L8]] to float
+; CHECK-NEXT: [[E9:%.*]] = fpext half [[L9]] to float
+; CHECK-NEXT: [[E10:%.*]] = fpext half [[L10]] to float
+; CHECK-NEXT: [[E11:%.*]] = fpext half [[L11]] to float
+; CHECK-NEXT: [[E12:%.*]] = fpext half [[L12]] to float
+; CHECK-NEXT: [[E13:%.*]] = fpext half [[L13]] to float
+; CHECK-NEXT: [[E14:%.*]] = fpext half [[L14]] to float
+; CHECK-NEXT: [[E15:%.*]] = fpext half [[L15]] to float
+; CHECK-NEXT: [[D1:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 1
+; CHECK-NEXT: [[D2:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 2
+; CHECK-NEXT: [[D15:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 3
+; CHECK-NEXT: [[D4:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 4
+; CHECK-NEXT: [[D5:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 5
+; CHECK-NEXT: [[D6:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 6
+; CHECK-NEXT: [[D7:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 7
+; CHECK-NEXT: [[D8:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 8
+; CHECK-NEXT: [[D9:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 9
+; CHECK-NEXT: [[D10:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 10
+; CHECK-NEXT: [[D11:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 11
+; CHECK-NEXT: [[D12:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 12
+; CHECK-NEXT: [[D13:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 13
+; CHECK-NEXT: [[D14:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 14
+; CHECK-NEXT: [[D16:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 15
+; CHECK-NEXT: store float [[E0]], ptr [[D0]], align 8
+; CHECK-NEXT: store float [[E1]], ptr [[D1]], align 8
+; CHECK-NEXT: store float [[E2]], ptr [[D2]], align 8
+; CHECK-NEXT: store float [[E3]], ptr [[D15]], align 8
+; CHECK-NEXT: store float [[E4]], ptr [[D4]], align 8
+; CHECK-NEXT: store float [[E5]], ptr [[D5]], align 8
+; CHECK-NEXT: store float [[E6]], ptr [[D6]], align 8
+; CHECK-NEXT: store float [[E7]], ptr [[D7]], align 8
+; CHECK-NEXT: store float [[E8]], ptr [[D8]], align 8
+; CHECK-NEXT: store float [[E9]], ptr [[D9]], align 8
+; CHECK-NEXT: store float [[E10]], ptr [[D10]], align 8
+; CHECK-NEXT: store float [[E11]], ptr [[D11]], align 8
+; CHECK-NEXT: store float [[E12]], ptr [[D12]], align 8
+; CHECK-NEXT: store float [[E13]], ptr [[D13]], align 8
+; CHECK-NEXT: store float [[E14]], ptr [[D14]], align 8
+; CHECK-NEXT: store float [[E15]], ptr [[D16]], align 8
+; CHECK-NEXT: ret void
+;
+; CHECK-F16C-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-F16C-NEXT: [[S8:%.*]] = getelementptr inbounds half, ptr [[S0]], i64 8
+; CHECK-F16C-NEXT: [[D8:%.*]] = getelementptr inbounds float, ptr [[D0]], i64 8
+; CHECK-F16C-NEXT: [[TMP1:%.*]] = load <8 x half>, ptr [[S0]], align 2
+; CHECK-F16C-NEXT: [[TMP2:%.*]] = fpext <8 x half> [[TMP1]] to <8 x float>
+; CHECK-F16C-NEXT: [[TMP3:%.*]] = load <8 x half>, ptr [[S8]], align 2
+; CHECK-F16C-NEXT: [[TMP4:%.*]] = fpext <8 x half> [[TMP3]] to <8 x float>
+; CHECK-F16C-NEXT: store <8 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-F16C-NEXT: store <8 x float> [[TMP4]], ptr [[D8]], align 8
+; CHECK-F16C-NEXT: ret void
+;
+; CHECK-AVX512-LABEL: define void @fpext_v16xf15_v16xf32(
+; CHECK-AVX512-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
+; CHECK-AVX512-NEXT: [[TMP1:%.*]] = load <16 x half>, ptr [[S0]], align 2
+; CHECK-AVX512-NEXT: [[TMP2:%.*]] = fpext <16 x half> [[TMP1]] to <16 x float>
+; CHECK-AVX512-NEXT: store <16 x float> [[TMP2]], ptr [[D0]], align 8
+; CHECK-AVX512-NEXT: ret void
+;
+ %s1 = getelementptr inbounds half, ptr %s0, i64 1
+ %s2 = getelementptr inbounds half, ptr %s0, i64 2
+ %s3 = getelementptr inbounds half, ptr %s0, i64 3
+ %s4 = getelementptr inbounds half, ptr %s0, i64 4
+ %s5 = getelementptr inbounds half, ptr %s0, i64 5
+ %s6 = getelementptr inbounds half, ptr %s0, i64 6
+ %s7 = getelementptr inbounds half, ptr %s0, i64 7
+ %s8 = getelementptr inbounds half, ptr %s0, i64 8
+ %s9 = getelementptr inbounds half, ptr %s0, i64 9
+ %s10 = getelementptr inbounds half, ptr %s0, i64 10
+ %s11 = getelementptr inbounds half, ptr %s0, i64 11
+ %s12 = getelementptr inbounds half, ptr %s0, i64 12
+ %s13 = getelementptr inbounds half, ptr %s0, i64 13
+ %s14 = getelementptr inbounds half, ptr %s0, i64 14
+ %s15 = getelementptr inbounds half, ptr %s0, i64 15
+ %l0 = load half, ptr %s0, align 2
+ %l1 = load half, ptr %s1, align 2
+ %l2 = load half, ptr %s2, align 2
+ %l3 = load half, ptr %s3, align 2
+ %l4 = load half, ptr %s4, align 2
+ %l5 = load half, ptr %s5, align 2
+ %l6 = load half, ptr %s6, align 2
+ %l7 = load half, ptr %s7, align 2
+ %l8 = load half, ptr %s8, align 2
+ %l9 = load half, ptr %s9, align 2
+ %l10 = load half, ptr %s10, align 2
+ %l11 = load half, ptr %s11, align 2
+ %l12 = load half, ptr %s12, align 2
+ %l13 = load half, ptr %s13, align 2
+ %l14 = load half, ptr %s14, align 2
+ %l15 = load half, ptr %s15, align 2
+
+ %e0 = fpext half %l0 to float
+ %e1 = fpext half %l1 to float
+ %e2 = fpext half %l2 to float
+ %e3 = fpext half %l3 to float
+ %e4 = fpext half %l4 to float
+ %e5 = fpext half %l5 to float
+ %e6 = fpext half %l6 to float
+ %e7 = fpext half %l7 to float
+ %e8 = fpext half %l8 to float
+ %e9 = fpext half %l9 to float
+ %e10 = fpext half %l10 to float
+ %e11 = fpext half %l11 to float
+ %e12 = fpext half %l12 to float
+ %e13 = fpext half %l13 to float
+ %e14 = fpext half %l14 to float
+ %e15 = fpext half %l15 to float
+
+ %d1 = getelementptr inbounds float, ptr %d0, i64 1
+ %d2 = getelementptr inbounds float, ptr %d0, i64 2
+ %d3 = getelementptr inbounds float, ptr %d0, i64 3
+ %d4 = getelementptr inbounds float, ptr %d0, i64 4
+ %d5 = getelementptr inbounds float, ptr %d0, i64 5
+ %d6 = getelementptr inbounds float, ptr %d0, i64 6
+ %d7 = getelementptr inbounds float, ptr %d0, i64 7
+ %d8 = getelementptr inbounds float, ptr %d0, i64 8
+ %d9 = getelementptr inbounds float, ptr %d0, i64 9
+ %d10 = getelementptr inbounds float, ptr %d0, i64 10
+ %d11 = getelementptr inbounds float, ptr %d0, i64 11
+ %d12 = getelementptr inbounds float, ptr %d0, i64 12
+ %d13 = getelementptr inbounds float, ptr %d0, i64 13
+ %d14 = getelementptr inbounds float, ptr %d0, i64 14
+ %d15 = getelementptr inbounds float, ptr %d0, i64 15
+ store float %e0, ptr %d0, align 8
+ store float %e1, ptr %d1, ali...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff 8ae39c8e34de2d24c46827b324c76bac845c18b0 84a22ef1a5ced6fc2383c38b99fb040df17db76e --extensions cpp,h -- llvm/lib/Target/X86/X86TargetTransformInfo.cpp llvm/lib/Target/X86/X86TargetTransformInfo.hView the diff from clang-format here.diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index bae223243b..9abc2051bf 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -2293,143 +2293,221 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
// 256-bit wide vectors.
static const TypeConversionCostKindTblEntry AVX512FConversionTbl[] = {
- { ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, { 1, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, { 3, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::v16f64, MVT::v16f32, { 4, 1, 1, 1 } }, // 2*vcvtps2pd+vextractf64x4
- { ISD::FP_EXTEND, MVT::v16f32, MVT::v16f16, { 1, 1, 1, 1 } }, // vcvtph2ps
- { ISD::FP_EXTEND, MVT::v8f64, MVT::v8f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
- { ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, { 1, 1, 1, 1 } },
- { ISD::FP_ROUND, MVT::v16f16, MVT::v16f32, { 1, 1, 1, 1 } }, // vcvtps2ph
-
- { ISD::TRUNCATE, MVT::v2i1, MVT::v2i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v4i1, MVT::v4i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v8i1, MVT::v8i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v16i1, MVT::v16i8, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v2i1, MVT::v2i16, { 3, 1, 1, 1 } }, // sext+vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v4i1, MVT::v4i16, { 3, 1, 1, 1 } }, // sext+vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v8i1, MVT::v8i16, { 3, 1, 1, 1 } }, // sext+vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v16i1, MVT::v16i16, { 3, 1, 1, 1 } }, // sext+vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v2i1, MVT::v2i32, { 2, 1, 1, 1 } }, // zmm vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v4i1, MVT::v4i32, { 2, 1, 1, 1 } }, // zmm vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v8i1, MVT::v8i32, { 2, 1, 1, 1 } }, // zmm vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v16i1, MVT::v16i32, { 2, 1, 1, 1 } }, // vpslld+vptestmd
- { ISD::TRUNCATE, MVT::v2i1, MVT::v2i64, { 2, 1, 1, 1 } }, // zmm vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v4i1, MVT::v4i64, { 2, 1, 1, 1 } }, // zmm vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v8i1, MVT::v8i64, { 2, 1, 1, 1 } }, // vpsllq+vptestmq
- { ISD::TRUNCATE, MVT::v2i8, MVT::v2i32, { 2, 1, 1, 1 } }, // vpmovdb
- { ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, { 2, 1, 1, 1 } }, // vpmovdb
- { ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, { 2, 1, 1, 1 } }, // vpmovdb
- { ISD::TRUNCATE, MVT::v32i8, MVT::v16i32, { 2, 1, 1, 1 } }, // vpmovdb
- { ISD::TRUNCATE, MVT::v64i8, MVT::v16i32, { 2, 1, 1, 1 } }, // vpmovdb
- { ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, { 2, 1, 1, 1 } }, // vpmovdw
- { ISD::TRUNCATE, MVT::v32i16, MVT::v16i32, { 2, 1, 1, 1 } }, // vpmovdw
- { ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, { 2, 1, 1, 1 } }, // vpmovqb
- { ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, { 1, 1, 1, 1 } }, // vpshufb
- { ISD::TRUNCATE, MVT::v8i8, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqb
- { ISD::TRUNCATE, MVT::v16i8, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqb
- { ISD::TRUNCATE, MVT::v32i8, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqb
- { ISD::TRUNCATE, MVT::v64i8, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqb
- { ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqw
- { ISD::TRUNCATE, MVT::v16i16, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqw
- { ISD::TRUNCATE, MVT::v32i16, MVT::v8i64, { 2, 1, 1, 1 } }, // vpmovqw
- { ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, { 1, 1, 1, 1 } }, // vpmovqd
- { ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, { 1, 1, 1, 1 } }, // zmm vpmovqd
- { ISD::TRUNCATE, MVT::v16i8, MVT::v16i64, { 5, 1, 1, 1 } },// 2*vpmovqd+concat+vpmovdb
-
- { ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, { 3, 1, 1, 1 } }, // extend to v16i32
- { ISD::TRUNCATE, MVT::v32i8, MVT::v32i16, { 8, 1, 1, 1 } },
- { ISD::TRUNCATE, MVT::v64i8, MVT::v32i16, { 8, 1, 1, 1 } },
-
- // Sign extend is zmm vpternlogd+vptruncdb.
- // Zero extend is zmm broadcast load+vptruncdw.
- { ISD::SIGN_EXTEND, MVT::v2i8, MVT::v2i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v2i8, MVT::v2i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v4i8, MVT::v4i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v4i8, MVT::v4i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v8i8, MVT::v8i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v8i8, MVT::v8i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v16i8, MVT::v16i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v16i8, MVT::v16i1, { 4, 1, 1, 1 } },
-
- // Sign extend is zmm vpternlogd+vptruncdw.
- // Zero extend is zmm vpternlogd+vptruncdw+vpsrlw.
- { ISD::SIGN_EXTEND, MVT::v2i16, MVT::v2i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v2i16, MVT::v2i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v4i16, MVT::v4i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i1, { 4, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i1, { 3, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i1, { 4, 1, 1, 1 } },
-
- { ISD::SIGN_EXTEND, MVT::v2i32, MVT::v2i1, { 1, 1, 1, 1 } }, // zmm vpternlogd
- { ISD::ZERO_EXTEND, MVT::v2i32, MVT::v2i1, { 2, 1, 1, 1 } }, // zmm vpternlogd+psrld
- { ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i1, { 1, 1, 1, 1 } }, // zmm vpternlogd
- { ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i1, { 2, 1, 1, 1 } }, // zmm vpternlogd+psrld
- { ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1, { 1, 1, 1, 1 } }, // zmm vpternlogd
- { ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i1, { 2, 1, 1, 1 } }, // zmm vpternlogd+psrld
- { ISD::SIGN_EXTEND, MVT::v2i64, MVT::v2i1, { 1, 1, 1, 1 } }, // zmm vpternlogq
- { ISD::ZERO_EXTEND, MVT::v2i64, MVT::v2i1, { 2, 1, 1, 1 } }, // zmm vpternlogq+psrlq
- { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i1, { 1, 1, 1, 1 } }, // zmm vpternlogq
- { ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i1, { 2, 1, 1, 1 } }, // zmm vpternlogq+psrlq
-
- { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i1, { 1, 1, 1, 1 } }, // vpternlogd
- { ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i1, { 2, 1, 1, 1 } }, // vpternlogd+psrld
- { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i1, { 1, 1, 1, 1 } }, // vpternlogq
- { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i1, { 2, 1, 1, 1 } }, // vpternlogq+psrlq
-
- { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, { 1, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, { 1, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, { 1, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, { 1, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i8, { 1, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i8, { 1, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i16, { 1, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i16, { 1, 1, 1, 1 } },
- { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, { 1, 1, 1, 1 } },
- { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, { 1, 1, 1, 1 } },
-
- { ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i8, { 3, 1, 1, 1 } }, // FIXME: May not be right
- { ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i8, { 3, 1, 1, 1 } }, // FIXME: May not be right
-
- { ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, { 4, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, { 3, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v8f64, MVT::v16i8, { 2, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, { 1, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, { 2, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, { 1, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, { 1, 1, 1, 1 } },
- { ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, { 1, 1, 1, 1 } },
-
- { ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, { 4, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, { 3, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v8f64, MVT::v16i8, { 2, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, { 1, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, { 2, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, { 1, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, { 1, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, { 1, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, {26, 1, 1, 1 } },
- { ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, { 5, 1, 1, 1 } },
-
- { ISD::FP_TO_SINT, MVT::v16i8, MVT::v16f32, { 2, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v16i8, MVT::v16f64, { 7, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v32i8, MVT::v32f64, {15, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v64i8, MVT::v64f32, {11, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v64i8, MVT::v64f64, {31, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v8i16, MVT::v8f64, { 3, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v16i16, MVT::v16f64, { 7, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v32i16, MVT::v32f32, { 5, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v32i16, MVT::v32f64, {15, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f64, { 1, 1, 1, 1 } },
- { ISD::FP_TO_SINT, MVT::v16i32, MVT::v16f64, { 3, 1, 1, 1 } },
-
- { ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f64, { 1, 1, 1, 1 } },
- { ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f64, { 3, 1, 1, 1 } },
- { ISD::FP_TO_UINT, MVT::v8i8, MVT::v8f64, { 3, 1, 1, 1 } },
- { ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, { 1, 1, 1, 1 } },
- { ISD::FP_TO_UINT, MVT::v16i16, MVT::v16f32, { 3, 1, 1, 1 } },
- { ISD::FP_TO_UINT, MVT::v16i8, MVT::v16f32, { 3, 1, 1, 1 } },
+ {ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, {1, 1, 1, 1}},
+ {ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, {3, 1, 1, 1}},
+ {ISD::FP_EXTEND,
+ MVT::v16f64,
+ MVT::v16f32,
+ {4, 1, 1, 1}}, // 2*vcvtps2pd+vextractf64x4
+ {ISD::FP_EXTEND, MVT::v16f32, MVT::v16f16, {1, 1, 1, 1}}, // vcvtph2ps
+ {ISD::FP_EXTEND,
+ MVT::v8f64,
+ MVT::v8f16,
+ {2, 1, 1, 1}}, // vcvtph2ps+vcvtps2pd
+ {ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, {1, 1, 1, 1}},
+ {ISD::FP_ROUND, MVT::v16f16, MVT::v16f32, {1, 1, 1, 1}}, // vcvtps2ph
+
+ {ISD::TRUNCATE,
+ MVT::v2i1,
+ MVT::v2i8,
+ {3, 1, 1, 1}}, // sext+vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v4i1,
+ MVT::v4i8,
+ {3, 1, 1, 1}}, // sext+vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v8i1,
+ MVT::v8i8,
+ {3, 1, 1, 1}}, // sext+vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v16i1,
+ MVT::v16i8,
+ {3, 1, 1, 1}}, // sext+vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v2i1,
+ MVT::v2i16,
+ {3, 1, 1, 1}}, // sext+vpsllq+vptestmq
+ {ISD::TRUNCATE,
+ MVT::v4i1,
+ MVT::v4i16,
+ {3, 1, 1, 1}}, // sext+vpsllq+vptestmq
+ {ISD::TRUNCATE,
+ MVT::v8i1,
+ MVT::v8i16,
+ {3, 1, 1, 1}}, // sext+vpsllq+vptestmq
+ {ISD::TRUNCATE,
+ MVT::v16i1,
+ MVT::v16i16,
+ {3, 1, 1, 1}}, // sext+vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v2i1,
+ MVT::v2i32,
+ {2, 1, 1, 1}}, // zmm vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v4i1,
+ MVT::v4i32,
+ {2, 1, 1, 1}}, // zmm vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v8i1,
+ MVT::v8i32,
+ {2, 1, 1, 1}}, // zmm vpslld+vptestmd
+ {ISD::TRUNCATE, MVT::v16i1, MVT::v16i32, {2, 1, 1, 1}}, // vpslld+vptestmd
+ {ISD::TRUNCATE,
+ MVT::v2i1,
+ MVT::v2i64,
+ {2, 1, 1, 1}}, // zmm vpsllq+vptestmq
+ {ISD::TRUNCATE,
+ MVT::v4i1,
+ MVT::v4i64,
+ {2, 1, 1, 1}}, // zmm vpsllq+vptestmq
+ {ISD::TRUNCATE, MVT::v8i1, MVT::v8i64, {2, 1, 1, 1}}, // vpsllq+vptestmq
+ {ISD::TRUNCATE, MVT::v2i8, MVT::v2i32, {2, 1, 1, 1}}, // vpmovdb
+ {ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, {2, 1, 1, 1}}, // vpmovdb
+ {ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, {2, 1, 1, 1}}, // vpmovdb
+ {ISD::TRUNCATE, MVT::v32i8, MVT::v16i32, {2, 1, 1, 1}}, // vpmovdb
+ {ISD::TRUNCATE, MVT::v64i8, MVT::v16i32, {2, 1, 1, 1}}, // vpmovdb
+ {ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, {2, 1, 1, 1}}, // vpmovdw
+ {ISD::TRUNCATE, MVT::v32i16, MVT::v16i32, {2, 1, 1, 1}}, // vpmovdw
+ {ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, {2, 1, 1, 1}}, // vpmovqb
+ {ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, {1, 1, 1, 1}}, // vpshufb
+ {ISD::TRUNCATE, MVT::v8i8, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqb
+ {ISD::TRUNCATE, MVT::v16i8, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqb
+ {ISD::TRUNCATE, MVT::v32i8, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqb
+ {ISD::TRUNCATE, MVT::v64i8, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqb
+ {ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqw
+ {ISD::TRUNCATE, MVT::v16i16, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqw
+ {ISD::TRUNCATE, MVT::v32i16, MVT::v8i64, {2, 1, 1, 1}}, // vpmovqw
+ {ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, {1, 1, 1, 1}}, // vpmovqd
+ {ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, {1, 1, 1, 1}}, // zmm vpmovqd
+ {ISD::TRUNCATE,
+ MVT::v16i8,
+ MVT::v16i64,
+ {5, 1, 1, 1}}, // 2*vpmovqd+concat+vpmovdb
+
+ {ISD::TRUNCATE,
+ MVT::v16i8,
+ MVT::v16i16,
+ {3, 1, 1, 1}}, // extend to v16i32
+ {ISD::TRUNCATE, MVT::v32i8, MVT::v32i16, {8, 1, 1, 1}},
+ {ISD::TRUNCATE, MVT::v64i8, MVT::v32i16, {8, 1, 1, 1}},
+
+ // Sign extend is zmm vpternlogd+vptruncdb.
+ // Zero extend is zmm broadcast load+vptruncdw.
+ {ISD::SIGN_EXTEND, MVT::v2i8, MVT::v2i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v2i8, MVT::v2i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v4i8, MVT::v4i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v4i8, MVT::v4i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v8i8, MVT::v8i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v8i8, MVT::v8i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v16i8, MVT::v16i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v16i8, MVT::v16i1, {4, 1, 1, 1}},
+
+ // Sign extend is zmm vpternlogd+vptruncdw.
+ // Zero extend is zmm vpternlogd+vptruncdw+vpsrlw.
+ {ISD::SIGN_EXTEND, MVT::v2i16, MVT::v2i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v2i16, MVT::v2i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v4i16, MVT::v4i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i1, {4, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i1, {3, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i1, {4, 1, 1, 1}},
+
+ {ISD::SIGN_EXTEND, MVT::v2i32, MVT::v2i1, {1, 1, 1, 1}}, // zmm vpternlogd
+ {ISD::ZERO_EXTEND,
+ MVT::v2i32,
+ MVT::v2i1,
+ {2, 1, 1, 1}}, // zmm vpternlogd+psrld
+ {ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i1, {1, 1, 1, 1}}, // zmm vpternlogd
+ {ISD::ZERO_EXTEND,
+ MVT::v4i32,
+ MVT::v4i1,
+ {2, 1, 1, 1}}, // zmm vpternlogd+psrld
+ {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1, {1, 1, 1, 1}}, // zmm vpternlogd
+ {ISD::ZERO_EXTEND,
+ MVT::v8i32,
+ MVT::v8i1,
+ {2, 1, 1, 1}}, // zmm vpternlogd+psrld
+ {ISD::SIGN_EXTEND, MVT::v2i64, MVT::v2i1, {1, 1, 1, 1}}, // zmm vpternlogq
+ {ISD::ZERO_EXTEND,
+ MVT::v2i64,
+ MVT::v2i1,
+ {2, 1, 1, 1}}, // zmm vpternlogq+psrlq
+ {ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i1, {1, 1, 1, 1}}, // zmm vpternlogq
+ {ISD::ZERO_EXTEND,
+ MVT::v4i64,
+ MVT::v4i1,
+ {2, 1, 1, 1}}, // zmm vpternlogq+psrlq
+
+ {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i1, {1, 1, 1, 1}}, // vpternlogd
+ {ISD::ZERO_EXTEND,
+ MVT::v16i32,
+ MVT::v16i1,
+ {2, 1, 1, 1}}, // vpternlogd+psrld
+ {ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i1, {1, 1, 1, 1}}, // vpternlogq
+ {ISD::ZERO_EXTEND,
+ MVT::v8i64,
+ MVT::v8i1,
+ {2, 1, 1, 1}}, // vpternlogq+psrlq
+
+ {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, {1, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, {1, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, {1, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, {1, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i8, {1, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i8, {1, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i16, {1, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i16, {1, 1, 1, 1}},
+ {ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, {1, 1, 1, 1}},
+ {ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, {1, 1, 1, 1}},
+
+ {ISD::SIGN_EXTEND,
+ MVT::v32i16,
+ MVT::v32i8,
+ {3, 1, 1, 1}}, // FIXME: May not be right
+ {ISD::ZERO_EXTEND,
+ MVT::v32i16,
+ MVT::v32i8,
+ {3, 1, 1, 1}}, // FIXME: May not be right
+
+ {ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, {4, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, {3, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v8f64, MVT::v16i8, {2, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, {1, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, {2, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, {1, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, {1, 1, 1, 1}},
+ {ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, {1, 1, 1, 1}},
+
+ {ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, {4, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, {3, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v8f64, MVT::v16i8, {2, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, {1, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, {2, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, {1, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, {1, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, {1, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, {26, 1, 1, 1}},
+ {ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, {5, 1, 1, 1}},
+
+ {ISD::FP_TO_SINT, MVT::v16i8, MVT::v16f32, {2, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v16i8, MVT::v16f64, {7, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v32i8, MVT::v32f64, {15, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v64i8, MVT::v64f32, {11, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v64i8, MVT::v64f64, {31, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v8i16, MVT::v8f64, {3, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v16i16, MVT::v16f64, {7, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v32i16, MVT::v32f32, {5, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v32i16, MVT::v32f64, {15, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f64, {1, 1, 1, 1}},
+ {ISD::FP_TO_SINT, MVT::v16i32, MVT::v16f64, {3, 1, 1, 1}},
+
+ {ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f64, {1, 1, 1, 1}},
+ {ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f64, {3, 1, 1, 1}},
+ {ISD::FP_TO_UINT, MVT::v8i8, MVT::v8f64, {3, 1, 1, 1}},
+ {ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, {1, 1, 1, 1}},
+ {ISD::FP_TO_UINT, MVT::v16i16, MVT::v16f32, {3, 1, 1, 1}},
+ {ISD::FP_TO_UINT, MVT::v16i8, MVT::v16f32, {3, 1, 1, 1}},
};
static const TypeConversionCostKindTblEntry AVX512BWVLConversionTbl[] {
@@ -2977,14 +3055,17 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
};
static const TypeConversionCostKindTblEntry F16ConversionTbl[] = {
- { ISD::FP_ROUND, MVT::f16, MVT::f32, { 1, 1, 1, 1 } },
- { ISD::FP_ROUND, MVT::v8f16, MVT::v8f32, { 1, 1, 1, 1 } },
- { ISD::FP_ROUND, MVT::v4f16, MVT::v4f32, { 1, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::f32, MVT::f16, { 1, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::f64, MVT::f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
- { ISD::FP_EXTEND, MVT::v8f32, MVT::v8f16, { 1, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::v4f32, MVT::v4f16, { 1, 1, 1, 1 } },
- { ISD::FP_EXTEND, MVT::v4f64, MVT::v4f16, { 2, 1, 1, 1 } }, // vcvtph2ps+vcvtps2pd
+ {ISD::FP_ROUND, MVT::f16, MVT::f32, {1, 1, 1, 1}},
+ {ISD::FP_ROUND, MVT::v8f16, MVT::v8f32, {1, 1, 1, 1}},
+ {ISD::FP_ROUND, MVT::v4f16, MVT::v4f32, {1, 1, 1, 1}},
+ {ISD::FP_EXTEND, MVT::f32, MVT::f16, {1, 1, 1, 1}},
+ {ISD::FP_EXTEND, MVT::f64, MVT::f16, {2, 1, 1, 1}}, // vcvtph2ps+vcvtps2pd
+ {ISD::FP_EXTEND, MVT::v8f32, MVT::v8f16, {1, 1, 1, 1}},
+ {ISD::FP_EXTEND, MVT::v4f32, MVT::v4f16, {1, 1, 1, 1}},
+ {ISD::FP_EXTEND,
+ MVT::v4f64,
+ MVT::v4f16,
+ {2, 1, 1, 1}}, // vcvtph2ps+vcvtps2pd
};
// Attempt to map directly to (simple) MVT types to let us match custom entries.
|
|
(seems the |
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.
revert breaks hipRuntime build 054c23d X86: Improve cost model of fp16 conversion (llvm#113195) Change-Id: I2dbd30b82c6b355ff83368c9fd5c8b2f83ce5db1
|
|
||
| if (ISD == ISD::FP_ROUND && LTDest.second.getScalarType() == MVT::f16) { | ||
| // Conversion requires a libcall. | ||
| return InstructionCost::getInvalid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking https://github.com/google/jax/blob/main/tests/lax_test.py#L3630 LazyConstantTest.testConvertElementTypeAvoidsCopies21 (dtype_in=<class 'numpy.float64'>, dtype_out=<class 'numpy.float16'>).
With
F1029 08:45:30.640847 4013 logging.cc:62] assert.h assertion failed at [third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4569](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp?l=4569&ws=joelwee/4894&snapshot=26) in VectorizationFactor llvm::LoopVectorizationPlanner::selectVectorizationFactor(): ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop"
*** Check failure stack trace: ***
@ 0x7ef66f09cf59 absl::log_internal::LogMessage::SendToLog()
@ 0x7ef66f09c4fe absl::log_internal::LogMessage::Flush()
@ 0x7ef66f09d519 absl::log_internal::LogMessageFatal::~LogMessageFatal()
@ 0x7ef67ade7314 __assert_fail
@ 0x7efa86da3f10 llvm::LoopVectorizationPlanner::selectVectorizationFactor()
@ 0x7efa86db95df llvm::LoopVectorizationPlanner::computeBestVF()
@ 0x7efa86dcbdfd llvm::LoopVectorizePass::processLoop()
@ 0x7efa86dd2c3d llvm::LoopVectorizePass::runImpl()
@ 0x7efa86dd3875 llvm::LoopVectorizePass::run()
@ 0x7efa8ceb7332 llvm::detail::PassModel<>::run()
@ 0x7ef9520b9050 llvm::PassManager<>::run()
@ 0x7efaff179412 llvm::detail::PassModel<>::run()
@ 0x7ef9520be28a llvm::ModuleToFunctionPassAdaptor::run()
@ 0x7efaff179192 llvm::detail::PassModel<>::run()
@ 0x7ef9520b7d7c llvm::PassManager<>::run()
@ 0x7efaa12ec861 xla::cpu::CompilerFunctor::operator()()
@ 0x7efa913b0271 llvm::orc::ThreadSafeModule::withModuleDo<>()
@ 0x7efa913b000b llvm::orc::IRCompileLayer::emit()
@ 0x7efa913e6d45 llvm::orc::BasicIRLayerMaterializationUnit::materialize()
@ 0x7efa91454337 llvm::orc::InPlaceTaskDispatcher::dispatch()
@ 0x7efa91349466 llvm::orc::ExecutionSession::dispatchOutstandingMUs()
@ 0x7efa9134e9e6 llvm::orc::ExecutionSession::OL_completeLookup()
@ 0x7efa91369a89 llvm::orc::InProgressFullLookupState::complete()
@ 0x7efa9133a0f0 llvm::orc::ExecutionSession::OL_applyQueryPhase1()
@ 0x7efa91337234 llvm::orc::ExecutionSession::lookup()
@ 0x7efa9134991e llvm::orc::ExecutionSession::lookup()
@ 0x7efa91349de8 llvm::orc::ExecutionSession::lookup()
@ 0x7efa9134a30e llvm::orc::ExecutionSession::lookup()
@ 0x7efa9134a459 llvm::orc::ExecutionSession::lookup()
@ 0x7efaa1719abf xla::cpu::SimpleOrcJIT::FindCompiledSymbol()
@ 0x7efaddc247c0 absl::internal_any_invocable::RemoteInvoker<>()
@ 0x7efaddc0fb68 std::__u::__function::__policy_invoker<>::__call_impl<>()
@ 0x7ef89847e1b6 tsl::thread::EigenEnvironment::ExecuteTask()
@ 0x7ef89847dd10 Eigen::ThreadPoolTempl<>::WorkerLoop()
@ 0x7ef89847d940 std::__u::invoke<>()
@ 0x7ef6a5f9e25e Thread::ThreadBody()
@ 0x7efafb6827db start_thread
@ 0x7efabc18e05f clone
I dumped the LLVM IR before:
; Function Attrs: nofree norecurse nosync nounwind memory(readwrite, inaccessiblemem: none) uwtable
define noalias noundef ptr @convert.2(ptr nocapture readonly %0) local_unnamed_addr #0 {
%args_gep = getelementptr inbounds nuw i8, ptr %0, i64 24
%args = load ptr, ptr %args_gep, align 8
%arg0 = load ptr, ptr %args, align 8, !invariant.load !0, !dereferenceable !1, !align !2
%arg1_gep = getelementptr i8, ptr %args, i64 16
%arg1 = load ptr, ptr %arg1_gep, align 8, !invariant.load !0, !dereferenceable !3, !align !2
br label %convert.2.loop_body.dim.0
convert.2.loop_body.dim.0: ; preds = %1, %convert.2.loop_body.dim.0
%convert.2.invar_address.dim.0.03 = phi i64 [ 0, %1 ], [ %invar.inc, %convert.2.loop_body.dim.0 ]
%2 = getelementptr inbounds [5 x double], ptr %arg0, i64 0, i64 %convert.2.invar_address.dim.0.03
%3 = load double, ptr %2, align 8, !invariant.load !0, !noalias !4
%4 = fptrunc double %3 to half
%5 = getelementptr inbounds [5 x half], ptr %arg1, i64 0, i64 %convert.2.invar_address.dim.0.03
store half %4, ptr %5, align 2, !alias.scope !4
%invar.inc = add nuw nsw i64 %convert.2.invar_address.dim.0.03, 1
%exitcond = icmp eq i64 %invar.inc, 5
br i1 %exitcond, label %return, label %convert.2.loop_body.dim.0
return: ; preds = %convert.2.loop_body.dim.0
ret ptr null
}
Could we fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to reproduce so far. Unfortunately the dump does not contain some of the referenced metadata so I have to make guesses for that. Then trying to run opt -S -o - -passes=loop-vectorize /tmp/x.ll works just fine and I guess I need some target setup (I played with some -mtriple=x86_64 -mattr=+avx512f,+f16c but that doesn't repro either).
That said, could you try if replacing the InstructionCost::getInvalid(); with InstructionCost::getMax() or if that doesn't work with a big number like 128 helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope #114128 fixes this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it looks like it does. Thanks! (And apologies about the bad dump)
Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in #113195
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction. Change-Id: I84a8a44795fc5d76cc573884c8c76bd04dfbb24b
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.
Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in llvm#113195
Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in llvm#113195 upstream commit: 255e441
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer
transforms the patterns:
setOperationActionof v4f16, v8f16 and v16f16 to Custom soTargetTransformInfo::getStoreMinimumVFreports them as acceptable.X86TTIImpl::getCastInstrCostconversion from/to fp16. Note that conversion from f64 to f16 is not
supported by an X86 instruction.