BasicTTI: Cleanup multiple result intrinsic handling #165970

arsenm · 2025-11-01T05:37:00Z

Avoid weird lambda returning function pointer and sink the libcall
logic to where the operation is handled. This allows chaining the
libcall logic to try sincos_stret and fallback to sincos. The resulting
cost seems too low.

arsenm · 2025-11-01T05:37:16Z

BasicTTI: Cleanup multiple result intrinsic handling #165970 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-11-01T05:37:53Z

@llvm/pr-subscribers-llvm-analysis

Author: Matt Arsenault (arsenm)

Changes

Avoid weird lambda returning function pointer and sink the libcall
logic to where the operation is handled. This allows chaining the
libcall logic to try sincos_stret and fallback to sincos. The resulting
cost seems too low.

Full diff: https://github.com/llvm/llvm-project/pull/165970.diff

2 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+40-20)
(modified) llvm/test/Analysis/CostModel/AArch64/sincos.ll (+17-4)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index e8dbc964a943e..bbce59b71edae 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -302,7 +302,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
   /// (e.g. scalarization).
   std::optional<InstructionCost> getMultipleResultIntrinsicVectorLibCallCost(
       const IntrinsicCostAttributes &ICA, TTI::TargetCostKind CostKind,
-      RTLIB::Libcall LC,
       std::optional<unsigned> CallRetElementIndex = {}) const {
     Type *RetTy = ICA.getReturnType();
     // Vector variants of the intrinsic can be mapped to a vector library call.
@@ -311,11 +310,43 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
         !isVectorizedStructTy(cast<StructType>(RetTy)))
       return std::nullopt;
 
+    Type *Ty = getContainedTypes(RetTy).front();
+    EVT VT = getTLI()->getValueType(DL, Ty);
+
+    EVT ScalarVT = VT.getScalarType();
+    RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL;
+
+    bool UsesMemoryOutArgument = true;
+
+    switch (ICA.getID()) {
+    case Intrinsic::modf:
+      LC = RTLIB::getMODF(ScalarVT);
+      break;
+    case Intrinsic::sincospi:
+      LC = RTLIB::getSINCOSPI(ScalarVT);
+      break;
+    case Intrinsic::sincos:
+      LC = RTLIB::getSINCOS_STRET(ScalarVT);
+      UsesMemoryOutArgument = false;
+
+      if (getTLI()->getLibcallImpl(LC) == RTLIB::Unsupported) {
+        LC = RTLIB::getSINCOS(ScalarVT);
+        UsesMemoryOutArgument = true;
+      }
+
+      break;
+    default:
+      return std::nullopt;
+    }
+
     // Find associated libcall.
-    const char *LCName = getTLI()->getLibcallName(LC);
-    if (!LCName)
+    RTLIB::LibcallImpl LibcallImpl = getTLI()->getLibcallImpl(LC);
+    if (LibcallImpl == RTLIB::Unsupported)
       return std::nullopt;
 
+    StringRef LCName =
+        RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpl);
+
     // Search for a corresponding vector variant.
     LLVMContext &Ctx = RetTy->getContext();
     ElementCount VF = getVectorizedTypeVF(RetTy);
@@ -336,6 +367,11 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
                                       VecTy, {}, CostKind, 0, nullptr, {});
     }
 
+    // Technically this depends on the ABI, but assume sincos_stret passes in
+    // registers.
+    if (!UsesMemoryOutArgument)
+      return Cost;
+
     // Lowering to a library call (with output pointers) may require us to emit
     // reloads for the results.
     for (auto [Idx, VectorTy] : enumerate(getContainedTypes(RetTy))) {
@@ -2137,22 +2173,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     case Intrinsic::modf:
     case Intrinsic::sincos:
     case Intrinsic::sincospi: {
-      Type *Ty = getContainedTypes(RetTy).front();
-      EVT VT = getTLI()->getValueType(DL, Ty);
-
-      RTLIB::Libcall LC = [&] {
-        switch (ICA.getID()) {
-        case Intrinsic::modf:
-          return RTLIB::getMODF;
-        case Intrinsic::sincos:
-          return RTLIB::getSINCOS;
-        case Intrinsic::sincospi:
-          return RTLIB::getSINCOSPI;
-        default:
-          llvm_unreachable("unexpected intrinsic");
-        }
-      }()(VT.getScalarType());
-
       std::optional<unsigned> CallRetElementIndex;
       // The first element of the modf result is returned by value in the
       // libcall.
@@ -2160,7 +2180,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
         CallRetElementIndex = 0;
 
       if (auto Cost = getMultipleResultIntrinsicVectorLibCallCost(
-              ICA, CostKind, LC, CallRetElementIndex))
+              ICA, CostKind, CallRetElementIndex))
         return *Cost;
       // Otherwise, fallback to default scalarization cost.
       break;
diff --git a/llvm/test/Analysis/CostModel/AArch64/sincos.ll b/llvm/test/Analysis/CostModel/AArch64/sincos.ll
index 32408acb582d0..72c8f2bbbf8cf 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sincos.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sincos.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --filter "sincos"
 ; RUN: opt < %s -mtriple=aarch64-gnu-linux -mattr=+neon,+sve -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output | FileCheck %s
 ; RUN: opt < %s -mtriple=aarch64-gnu-linux -mattr=+neon,+sve -vector-library=ArmPL -passes="print<cost-model>" -intrinsic-cost-strategy=intrinsic-cost -cost-kind=throughput 2>&1 -disable-output | FileCheck %s -check-prefix=CHECK-VECLIB
+; RUN: opt < %s -mtriple=arm64-apple-macos10.9 -mattr=+neon -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output | FileCheck -check-prefix=SINCOS_STRET %s
 
 define void @sincos() {
 ; CHECK-LABEL: 'sincos'
@@ -8,13 +9,11 @@ define void @sincos() {
 ; CHECK:  Cost Model: Found an estimated cost of 10 for instruction: %f32 = call { float, float } @llvm.sincos.f32(float poison)
 ; CHECK:  Cost Model: Found an estimated cost of 10 for instruction: %f64 = call { double, double } @llvm.sincos.f64(double poison)
 ; CHECK:  Cost Model: Found an estimated cost of 10 for instruction: %f128 = call { fp128, fp128 } @llvm.sincos.f128(fp128 poison)
-;
 ; CHECK:  Cost Model: Found an estimated cost of 36 for instruction: %v8f16 = call { <8 x half>, <8 x half> } @llvm.sincos.v8f16(<8 x half> poison)
 ; CHECK:  Cost Model: Found an estimated cost of 52 for instruction: %v4f32 = call { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> poison)
 ; CHECK:  Cost Model: Found an estimated cost of 24 for instruction: %v2f64 = call { <2 x double>, <2 x double> } @llvm.sincos.v2f64(<2 x double> poison)
 ; CHECK:  Cost Model: Found an estimated cost of 10 for instruction: %v1f128 = call { <1 x fp128>, <1 x fp128> } @llvm.sincos.v1f128(<1 x fp128> poison)
 ; CHECK:  Cost Model: Found an estimated cost of 104 for instruction: %v8f32 = call { <8 x float>, <8 x float> } @llvm.sincos.v8f32(<8 x float> poison)
-;
 ; CHECK:  Cost Model: Invalid cost for instruction: %nxv8f16 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.sincos.nxv8f16(<vscale x 8 x half> poison)
 ; CHECK:  Cost Model: Invalid cost for instruction: %nxv4f32 = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.sincos.nxv4f32(<vscale x 4 x float> poison)
 ; CHECK:  Cost Model: Invalid cost for instruction: %nxv2f64 = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.sincos.nxv2f64(<vscale x 2 x double> poison)
@@ -26,18 +25,32 @@ define void @sincos() {
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 10 for instruction: %f32 = call { float, float } @llvm.sincos.f32(float poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 10 for instruction: %f64 = call { double, double } @llvm.sincos.f64(double poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 10 for instruction: %f128 = call { fp128, fp128 } @llvm.sincos.f128(fp128 poison)
-;
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 36 for instruction: %v8f16 = call { <8 x half>, <8 x half> } @llvm.sincos.v8f16(<8 x half> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 12 for instruction: %v4f32 = call { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 12 for instruction: %v2f64 = call { <2 x double>, <2 x double> } @llvm.sincos.v2f64(<2 x double> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 10 for instruction: %v1f128 = call { <1 x fp128>, <1 x fp128> } @llvm.sincos.v1f128(<1 x fp128> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 104 for instruction: %v8f32 = call { <8 x float>, <8 x float> } @llvm.sincos.v8f32(<8 x float> poison)
-;
 ; CHECK-VECLIB:  Cost Model: Invalid cost for instruction: %nxv8f16 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.sincos.nxv8f16(<vscale x 8 x half> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 13 for instruction: %nxv4f32 = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.sincos.nxv4f32(<vscale x 4 x float> poison)
 ; CHECK-VECLIB:  Cost Model: Found an estimated cost of 13 for instruction: %nxv2f64 = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.sincos.nxv2f64(<vscale x 2 x double> poison)
 ; CHECK-VECLIB:  Cost Model: Invalid cost for instruction: %nxv1f128 = call { <vscale x 1 x fp128>, <vscale x 1 x fp128> } @llvm.sincos.nxv1f128(<vscale x 1 x fp128> poison)
 ; CHECK-VECLIB:  Cost Model: Invalid cost for instruction: %nxv8f32 = call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.sincos.nxv8f32(<vscale x 8 x float> poison)
+;
+; SINCOS_STRET-LABEL: 'sincos'
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 1 for instruction: %f16 = call { half, half } @llvm.sincos.f16(half poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 2 for instruction: %f32 = call { float, float } @llvm.sincos.f32(float poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 2 for instruction: %f64 = call { double, double } @llvm.sincos.f64(double poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 10 for instruction: %f128 = call { fp128, fp128 } @llvm.sincos.f128(fp128 poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 36 for instruction: %v8f16 = call { <8 x half>, <8 x half> } @llvm.sincos.v8f16(<8 x half> poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 20 for instruction: %v4f32 = call { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 8 for instruction: %v2f64 = call { <2 x double>, <2 x double> } @llvm.sincos.v2f64(<2 x double> poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 10 for instruction: %v1f128 = call { <1 x fp128>, <1 x fp128> } @llvm.sincos.v1f128(<1 x fp128> poison)
+; SINCOS_STRET:  Cost Model: Found an estimated cost of 40 for instruction: %v8f32 = call { <8 x float>, <8 x float> } @llvm.sincos.v8f32(<8 x float> poison)
+; SINCOS_STRET:  Cost Model: Invalid cost for instruction: %nxv8f16 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.sincos.nxv8f16(<vscale x 8 x half> poison)
+; SINCOS_STRET:  Cost Model: Invalid cost for instruction: %nxv4f32 = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.sincos.nxv4f32(<vscale x 4 x float> poison)
+; SINCOS_STRET:  Cost Model: Invalid cost for instruction: %nxv2f64 = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.sincos.nxv2f64(<vscale x 2 x double> poison)
+; SINCOS_STRET:  Cost Model: Invalid cost for instruction: %nxv1f128 = call { <vscale x 1 x fp128>, <vscale x 1 x fp128> } @llvm.sincos.nxv1f128(<vscale x 1 x fp128> poison)
+; SINCOS_STRET:  Cost Model: Invalid cost for instruction: %nxv8f32 = call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.sincos.nxv8f32(<vscale x 8 x float> poison)
 ;
   %f16 = call { half, half } @llvm.sincos.f16(half poison)
   %f32 = call { float, float } @llvm.sincos.f32(float poison)

s-barannikov

LGTM

MacDue · 2025-11-01T12:13:01Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      LC = RTLIB::getSINCOS_STRET(ScalarVT);
+      UsesMemoryOutArgument = false;
+
+      if (getTLI()->getLibcallImpl(LC) == RTLIB::Unsupported) {
+        LC = RTLIB::getSINCOS(ScalarVT);
+        UsesMemoryOutArgument = true;
+      }


I think this will prevent the vector variants of sincos from being found if SINCOS_STRET is available when costing a vector sincos intrinsic, as the vector mappings only exist for the standard library call (so the getVectorMappingInfo will fail). That'll result in a more costly scalarization.

Note that this is getMultipleResultIntrinsicVectorLibCallCost. I think the if (!UsesMemoryOutArgument) exit is probably dead code/untested (as it's going to bail out at not finding the VecDesc).

This system really should be built on top of actual libcall entries for the vector case, instead of trying to reverse engineer the not-great legalizer behavior (which also doesn't actually use the vector libcalls)

So this can't hurt the sincos intrinsic costs , because codegen doesn't use TargetLibraryInfo at all. It is not wired up to use vector library functions of any form

MacDue · 2025-11-01T16:02:12Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      // TODO: Account for sincos_stret not always using a memory operation for
+      // the out argument
+      LC = RTLIB::getSINCOS_STRET(ScalarVT);
+


I think it's wrong to use SINCOS_STRET here at all (I think you'd see the issue if you tested with -vector-library=ArmPL). There are currently no targets that have vector mappings for the stret version, so it does not make sense to factor that into trying to determine the vector cost.

I think the costs you're seeing are not for the stret versions are not generated by this function, as it'll bail out, as it only returns a cost if a vector mapping exists.

MacDue · 2025-11-01T16:05:25Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+    // FIXME: CodeGen use RuntimeLibcallsInfo, not TargetLibraryInfo and has no
+    // path to using the vector libcalls. So this guess at how legalization will
+    // work is just wrong.


This is how -vector-library= registers vector variants of functions (which can be done in IR, not just pre-baked into LLVM). Are you proposing changing that?

Yes, this doesn't work. The intrinsic will not codegen to the -vector-library functions. These are only emitted directly from the vectorizier. Eventually I want to merge TLI and RuntimeLibcalls

The intrinsics are lowered to the vector library calls. There are tests that show this.

Part of the point of these intrinsics was to allow these calls to be vectorized without LAA needing to handle anything more than basic loads/stores.

Hmm, I found it. This is basically a special case system and no other intrinsic is handled this way. We should have entries in RuntimeLibcalls like we do for all the scalar types, rather than having a side mechanism (e.g., as it is now the vector libcalls will not be retained in full LTO which will later be necessary)

Avoid weird lambda returning function pointer and sink the libcall logic to where the operation is handled. This allows chaining the libcall logic to try sincos_stret and fallback to sincos. The resulting cost seems too low.

arsenm added the llvm:codegen label Nov 1, 2025 — with Graphite App

arsenm requested review from MacDue, RKSimon, SamTebbs33, greened and s-barannikov November 1, 2025 05:37

arsenm marked this pull request as ready for review November 1, 2025 05:37

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Nov 1, 2025

s-barannikov approved these changes Nov 1, 2025

View reviewed changes

MacDue reviewed Nov 1, 2025

View reviewed changes

arsenm added 5 commits November 3, 2025 18:51

BasicTTI: Cleanup multiple result intrinsic handling

0c0a0d7

Avoid weird lambda returning function pointer and sink the libcall logic to where the operation is handled. This allows chaining the libcall logic to try sincos_stret and fallback to sincos. The resulting cost seems too low.

Drop UsesMemoryOutArgument part

eb77874

Add fixme

400e651

drop sincos_stret part since this is for vectors

600551a

Reword comment

f7f08b2

arsenm force-pushed the users/arsenm/tti/multi-result-libcall-cleanup branch from 6a11a87 to f7f08b2 Compare November 4, 2025 04:16

MacDue approved these changes Nov 4, 2025

View reviewed changes

arsenm merged commit fe106b6 into main Nov 4, 2025
10 checks passed

arsenm deleted the users/arsenm/tti/multi-result-libcall-cleanup branch November 4, 2025 18:19

kerbowa mentioned this pull request Nov 10, 2025

[AMDGPU] Verify dominance when rewriting spills to registers #167347

Open

BasicTTI: Cleanup multiple result intrinsic handling #165970

BasicTTI: Cleanup multiple result intrinsic handling #165970

Uh oh!

Conversation

arsenm commented Nov 1, 2025

Uh oh!

arsenm commented Nov 1, 2025

Uh oh!

llvmbot commented Nov 1, 2025

Uh oh!

s-barannikov left a comment

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MacDue Nov 1, 2025 •

edited

Loading

MacDue Nov 1, 2025 •

edited

Loading

MacDue Nov 1, 2025 •

edited

Loading