X86: Do not return invalid cost for fp16 conversion #114128

MatzeB · 2024-10-29T20:32:34Z

Returning invalid instruction costs when converting from/to fp16 in X86TTIImpl::getCastInstrCost when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion.

This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization.

This is a follow-up to assertion issues reported for the changes in #113195

llvmbot · 2024-10-29T20:33:07Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-x86

Author: Matthias Braun (MatzeB)

Changes

Returning invalid instruction when converting from/to fp16 in X86TTIImpl::getCastInstrCost when there is no hardware support available was triggering asserts. This changes the code to return a large number instead.

Full diff: https://github.com/llvm/llvm-project/pull/114128.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+7-5)
(modified) llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll (+3-8)

diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index bae223243b3dc9..ef16636a2ea544 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -3068,6 +3068,13 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
         if (auto KindCost = Entry->Cost[CostKind])
           return *KindCost;
     }
+
+    if ((ISD == ISD::FP_ROUND && SimpleDstTy == MVT::f16) ||
+        (ISD == ISD::FP_EXTEND && SimpleSrcTy == MVT::f16)) {
+      // fp16 conversions not covered yet require a libcall, return a
+      // large (arbitrary) number.
+      return InstructionCost(64);
+    }
   }
 
   // Fall back to legalized types.
@@ -3174,11 +3181,6 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
                             TTI::CastContextHint::None, CostKind);
   }
 
-  if (ISD == ISD::FP_ROUND && LTDest.second.getScalarType() == MVT::f16) {
-    // Conversion requires a libcall.
-    return InstructionCost::getInvalid();
-  }
-
   // TODO: Allow non-throughput costs that aren't binary.
   auto AdjustCost = [&CostKind](InstructionCost Cost,
                                 InstructionCost N = 1) -> InstructionCost {
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
index bcea147d724f53..f23043f0c47f4a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/conversion-fp16.ll
@@ -453,14 +453,9 @@ define void @fpround_v16xf32_v16xf16(ptr %s0, ptr %d0) {
 ;
 ; CHECK-F16C-LABEL: define void @fpround_v16xf32_v16xf16(
 ; CHECK-F16C-SAME: ptr [[S0:%.*]], ptr [[D0:%.*]]) #[[ATTR0]] {
-; CHECK-F16C-NEXT:    [[S8:%.*]] = getelementptr inbounds float, ptr [[S0]], i64 8
-; CHECK-F16C-NEXT:    [[D8:%.*]] = getelementptr inbounds half, ptr [[D0]], i64 8
-; CHECK-F16C-NEXT:    [[TMP1:%.*]] = load <8 x float>, ptr [[S0]], align 4
-; CHECK-F16C-NEXT:    [[TMP2:%.*]] = fptrunc <8 x float> [[TMP1]] to <8 x half>
-; CHECK-F16C-NEXT:    [[TMP3:%.*]] = load <8 x float>, ptr [[S8]], align 4
-; CHECK-F16C-NEXT:    [[TMP4:%.*]] = fptrunc <8 x float> [[TMP3]] to <8 x half>
-; CHECK-F16C-NEXT:    store <8 x half> [[TMP2]], ptr [[D0]], align 2
-; CHECK-F16C-NEXT:    store <8 x half> [[TMP4]], ptr [[D8]], align 2
+; CHECK-F16C-NEXT:    [[TMP1:%.*]] = load <16 x float>, ptr [[S0]], align 4
+; CHECK-F16C-NEXT:    [[TMP2:%.*]] = fptrunc <16 x float> [[TMP1]] to <16 x half>
+; CHECK-F16C-NEXT:    store <16 x half> [[TMP2]], ptr [[D0]], align 2
 ; CHECK-F16C-NEXT:    ret void
 ;
 ; CHECK-AVX512-LABEL: define void @fpround_v16xf32_v16xf16(

JoelWee

Thanks for the quick turnaround on this Matthias! This looks like it fixes the failure in jax/.../lax_test :)

JoelWee · 2024-10-29T21:43:41Z

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

+    if ((ISD == ISD::FP_ROUND && SimpleDstTy == MVT::f16) ||
+        (ISD == ISD::FP_EXTEND && SimpleSrcTy == MVT::f16)) {
+      // fp16 conversions not covered by any table entries require a libcall,
+      // return a large (arbitrary) number.


nit: "return a large (arbitrary) number to model this."

RKSimon

LGTM

llvm-ci · 2024-10-30T00:52:13Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 11 "Add check check-libc-amdgcn-amd-amdhsa".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7729

Here is the relevant piece of the build log for the reference

Step 11 (Add check check-libc-amdgcn-amd-amdhsa) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill
...
[2561/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.strcpy_test.__hermetic__.__build__
[2562/2681] Linking CXX executable libc/test/src/locale/libc.test.src.locale.localeconv_test.__hermetic__.__build__
[2563/2681] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoumax_test.__hermetic__.__build__
[2564/2681] Linking CXX executable libc/test/src/stdlib/libc.test.src.stdlib.strtof_test.__hermetic__.__build__
[2565/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.strncat_test.__hermetic__.__build__
[2566/2681] Linking CXX executable libc/test/src/string/libc.test.src.string.memmove_test.__hermetic__.__build__
[2567/2681] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoimax_test.__hermetic__.__build__
[2568/2681] Linking CXX executable libc/test/src/time/libc.test.src.time.clock_gettime_test.__hermetic__.__build__
[2569/2681] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.asprintf_test.__hermetic__.__build__
[2570/2681] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.vasprintf_test.__hermetic__.__build__
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1470.255292

Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in llvm#113195

Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in llvm#113195 upstream commit: 255e441

llvmbot added backend:X86 llvm:transforms labels Oct 29, 2024

MatzeB requested review from JoelWee, LebedevRI, RKSimon, phoebewang and topperc October 29, 2024 21:00

MatzeB mentioned this pull request Oct 29, 2024

X86: Improve cost model of fp16 conversion #113195

Merged

MatzeB force-pushed the no_invalid_f16_conversion_cost branch from 9ae0843 to eb9f427 Compare October 29, 2024 21:08

JoelWee approved these changes Oct 29, 2024

View reviewed changes

X86: Do not return invalid cost for fp16 conversion

2e43458

MatzeB force-pushed the no_invalid_f16_conversion_cost branch from eb9f427 to 2e43458 Compare October 29, 2024 21:47

RKSimon approved these changes Oct 29, 2024

View reviewed changes

MatzeB merged commit 255e441 into llvm:main Oct 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

X86: Do not return invalid cost for fp16 conversion #114128

X86: Do not return invalid cost for fp16 conversion #114128

Uh oh!

MatzeB commented Oct 29, 2024 •

edited

Loading

Uh oh!

llvmbot commented Oct 29, 2024 •

edited

Loading

Uh oh!

JoelWee left a comment

Uh oh!

JoelWee Oct 29, 2024

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

llvm-ci commented Oct 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

X86: Do not return invalid cost for fp16 conversion #114128

X86: Do not return invalid cost for fp16 conversion #114128

Uh oh!

Conversation

MatzeB commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoelWee left a comment

Choose a reason for hiding this comment

Uh oh!

JoelWee Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Oct 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MatzeB commented Oct 29, 2024 •

edited

Loading

llvmbot commented Oct 29, 2024 •

edited

Loading