DAG: Use sincos vector libcalls through RuntimeLibcalls #166984

arsenm · 2025-11-07T18:30:04Z

Copy new process from sincospi.

arsenm · 2025-11-07T18:30:18Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-11-07T18:33:31Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)

Changes

Copy new process from sincospi.

Full diff: https://github.com/llvm/llvm-project/pull/166984.diff

3 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+6-1)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+4-10)
(modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+18)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 1c167af4b0478..a52ad41d0f1b3 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -334,7 +334,12 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
 
       break;
     case Intrinsic::sincos:
-      LC = RTLIB::getSINCOS(ScalarVT);
+      LC = RTLIB::getSINCOS(VT);
+      if (LC == RTLIB::UNKNOWN_LIBCALL)
+        LC = RTLIB::getSINCOS(ScalarVT);
+      else if (VT.isVector())
+        IsVectorCall = true;
+
       break;
     default:
       return std::nullopt;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index f5a54497c8a98..78d8ea0676dd7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -1268,10 +1268,12 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
       return;
 
     break;
-
+  case ISD::FSINCOS:
   case ISD::FSINCOSPI: {
     EVT VT = Node->getValueType(0);
-    RTLIB::Libcall LC = RTLIB::getSINCOSPI(VT);
+    RTLIB::Libcall LC = Node->getOpcode() == ISD::FSINCOS
+                            ? RTLIB::getSINCOS(VT)
+                            : RTLIB::getSINCOSPI(VT);
     if (LC != RTLIB::UNKNOWN_LIBCALL &&
         DAG.expandMultipleResultFPLibCall(LC, Node, Results, VT))
       return;
@@ -1280,14 +1282,6 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
     // scalarizing.
     break;
   }
-  case ISD::FSINCOS: {
-    // FIXME: Try to directly match vector case like fsincospi
-    EVT VT = Node->getValueType(0).getVectorElementType();
-    RTLIB::Libcall LC = RTLIB::getSINCOS(VT);
-    if (DAG.expandMultipleResultFPLibCall(LC, Node, Results, VT))
-      return;
-    break;
-  }
   case ISD::FMODF: {
     EVT VT = Node->getValueType(0).getVectorElementType();
     RTLIB::Libcall LC = RTLIB::getMODF(VT);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 814b4b57a0b9b..b4eb6c357e10e 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -425,6 +425,24 @@ RTLIB::Libcall RTLIB::getCOS(EVT RetVT) {
 }
 
 RTLIB::Libcall RTLIB::getSINCOS(EVT RetVT) {
+  // TODO: Tablegen should generate this function
+  if (RetVT.isVector()) {
+    if (!RetVT.isSimple())
+      return RTLIB::UNKNOWN_LIBCALL;
+    switch (RetVT.getSimpleVT().SimpleTy) {
+    case MVT::v4f32:
+      return RTLIB::SINCOS_V4F32;
+    case MVT::v2f64:
+      return RTLIB::SINCOS_V2F64;
+    case MVT::nxv4f32:
+      return RTLIB::SINCOS_NXV4F32;
+    case MVT::nxv2f64:
+      return RTLIB::SINCOS_NXV2F64;
+    default:
+      return RTLIB::UNKNOWN_LIBCALL;
+    }
+  }
+
   return getFPLibCall(RetVT, SINCOS_F32, SINCOS_F64, SINCOS_F80, SINCOS_F128,
                       SINCOS_PPCF128);
 }

Andarwinux · 2025-11-08T17:37:12Z

llvm/lib/CodeGen/TargetLoweringBase.cpp

+      return RTLIB::SINCOS_NXV4F32;
+    case MVT::nxv2f64:
+      return RTLIB::SINCOS_NXV2F64;
+    default:


Can this handle SINCOS_V8F64? AMDLIBM has this.

It could but I'm only handling cases that are tested

@Andarwinux please raise an issue if you can find AMDLIBM methods llvm doesn't currently handle

@Andarwinux please raise an issue if you can find AMDLIBM methods llvm doesn't currently handle

No problems. It looks like AMDLIBM is still handling by TLI for now. I thought it would also switch to RuntimeLibcalls soon.

But veclib does indeed have issues on x86, see #164642.

These are the tested set of libcalls used for codegen of llvm.sincos and are needed to get the legalization to follow standard procedure.

Copy new process from sincospi.

RKSimon

LGTM

Andarwinux · 2025-11-13T14:23:40Z

This PR seems have broken the ability to vectorize sincos with -fveclib, now it no longer vectorizes no matter which veclib is used.

see #167871

MacDue · 2025-11-13T15:49:17Z

llvm/lib/IR/RuntimeLibcalls.cpp

  case VectorLibrary::SLEEFGNUABI:
    for (RTLIB::LibcallImpl Impl :
-         {RTLIB::impl__ZGVnN4vl4l4_sincospif, RTLIB::impl__ZGVnN2vl8l8_sincospi,
+         {RTLIB::impl__ZGVnN2vl8l8_sincos, RTLIB::impl__ZGVnN4vl4l4_sincosf,


It seems this has somehow broken sincos vectorization from the Clang side. Now you have to do: -fveclib=ArmPL -mllvm --vector-library=ArmPL, just -O3 -fno-math-errno -fveclib=ArmPL results in the default sincos expansion rather than a vector call.

Sadly, this means all the LLVM IR tests pass, but it still is broken in Clang.

See: https://godbolt.org/z/63PKzsG48

This is why TargetOptions should go away

Fixed by #167996

AMDLIBM sincos still not working

If you want that to work, can you add tests for all of the functions it it? We only have any test coverage for sleef and armpl. If there's no test, it's not going to work

If you want that to work, can you add tests for all of the functions it it? We only have any test coverage for sleef and armpl. If there's no test, it's not going to work

AMDLIBM did have some tests

llvm-project/llvm/test/Transforms/LoopVectorize/X86/amdlibm-calls.ll

Lines 1760 to 1819 in 4d42a0c

define void @sincos_f64(ptr noalias %a, ptr noalias %b, ptr noalias %c) {

; CHECK-LABEL: define void @sincos_f64

; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]])

; CHECK-VF2-NOT: call void @amd_vrd2_sincos(<2 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK-VF4-NOT: call void @amd_vrd4_sincos(<4 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK-VF8-NOT: call void @amd_vrd8_sincos(<8 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK: ret void

;

entry:

br label %for.body

for.body:

%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]

%gepa = getelementptr double, ptr %a, i64 %indvars.iv

%num = load double, ptr %gepa, align 8

%gepb = getelementptr double, ptr %b, i64 %indvars.iv

%gepc = getelementptr double, ptr %c, i64 %indvars.iv

call void @sincos(double %num, ptr %gepb, ptr %gepc)

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1

%exitcond = icmp eq i64 %indvars.iv.next, 1000

br i1 %exitcond, label %for.cond.cleanup, label %for.body

for.cond.cleanup:

ret void

}

define void @sincos_f32(ptr noalias %a, ptr noalias %b, ptr noalias %c) {

; CHECK-LABEL: define void @sincos_f32

; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]])

; CHECK-VF4-NOT: call void @amd_vrs4_sincosf(<4 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK-VF8-NOT: call void @amd_vrs8_sincosf(<8 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK-VF16-NOT: call void @amd_vrs16_sincosf(<16 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])

; CHECK: ret void

;

entry:

br label %for.body

for.body:

%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]

%gepa = getelementptr float, ptr %a, i64 %indvars.iv

%num = load float, ptr %gepa, align 8

%gepb = getelementptr float, ptr %b, i64 %indvars.iv

%gepc = getelementptr float, ptr %c, i64 %indvars.iv

call void @sincosf(float %num, ptr %gepb, ptr %gepc)

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1

%exitcond = icmp eq i64 %indvars.iv.next, 1000

br i1 %exitcond, label %for.cond.cleanup, label %for.body

for.cond.cleanup:

ret void

}

attributes #0 = { nounwind readnone }

declare double @exp10(double) #0

declare float @exp10f(float) #0

declare double @llvm.exp10.f64(double) #0

declare float @llvm.exp10.f32(float) #0

declare void @sincos(double, ptr, ptr)

declare void @sincosf(float, ptr, ptr)

That's not the standard codegen usage that's broken. The fact that LoopVectorize directly emits these calls is actually really weird (#162239 (comment)).

What's missing is codegen tests for using these calls for legalization of the intrinsics. e.g., the important bit of SLEEF and ArmPL coverage is here:

https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.sincos.ll
https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.sincospi.ll
https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.modf.ll

This was referenced Nov 7, 2025

TargetLibraryInfo: Split off VectorLibrary enum and flag #166980

Merged

RuntimeLibcalls: Add entries for vector sincospi functions #166981

Merged

RuntimeLibcalls: Remove incorrect sincospi from most targets #166982

Merged

arsenm requested review from MacDue, RKSimon, aemerson, dpaoliello, efriedma-quic, fhahn, ilovepi, inbelic, mysterymath, nikic, paulwalker-arm and topperc November 7, 2025 18:31

arsenm added the llvm:SelectionDAG SelectionDAGISel as well label Nov 7, 2025 — with Graphite App

arsenm marked this pull request as ready for review November 7, 2025 18:33

llvmbot added the llvm:codegen label Nov 7, 2025

arsenm mentioned this pull request Nov 7, 2025

RuntimeLibcalls: Add definitions for vector math functions #167026

Open

arsenm force-pushed the users/arsenm/runtime-libcalls/add-sincos-entries branch from 96ea8dc to 414d93b Compare November 7, 2025 22:39

arsenm force-pushed the users/arsenm/dag/use-sincos-runtime-libcalls branch from 94e35d8 to 17bfcbe Compare November 7, 2025 22:39

This was referenced Nov 7, 2025

RuntimeLibcalls: Add a few libm entries from TargetLibraryInfo #167049

Open

RuntimeLibcalls: Add __memcpy_chk, __memmove_chk, __memset_chk #167053

Open

Andarwinux reviewed Nov 8, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/runtime-libcalls/add-sincos-entries branch from 414d93b to 32550a9 Compare November 10, 2025 18:18

arsenm force-pushed the users/arsenm/dag/use-sincos-runtime-libcalls branch from 17bfcbe to 404f6f0 Compare November 10, 2025 18:18

arsenm added 2 commits November 10, 2025 11:18

RuntimeLibcalls: Add call entries for sincos sleef and armpl libcalls

768d8f2

These are the tested set of libcalls used for codegen of llvm.sincos and are needed to get the legalization to follow standard procedure.

DAG: Use sincos vector libcalls through RuntimeLibcalls

c5ecd24

Copy new process from sincospi.

arsenm force-pushed the users/arsenm/dag/use-sincos-runtime-libcalls branch from 404f6f0 to c5ecd24 Compare November 10, 2025 19:22

arsenm force-pushed the users/arsenm/runtime-libcalls/add-sincos-entries branch from 32550a9 to 768d8f2 Compare November 10, 2025 19:22

Base automatically changed from users/arsenm/runtime-libcalls/add-sincos-entries to main November 10, 2025 20:00

RKSimon approved these changes Nov 11, 2025

View reviewed changes

ilovepi approved these changes Nov 11, 2025

View reviewed changes

arsenm merged commit de68181 into main Nov 11, 2025
14 of 17 checks passed

arsenm deleted the users/arsenm/dag/use-sincos-runtime-libcalls branch November 11, 2025 18:51

MacDue reviewed Nov 13, 2025

View reviewed changes

	define void @sincos_f64(ptr noalias %a, ptr noalias %b, ptr noalias %c) {
	; CHECK-LABEL: define void @sincos_f64
	; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]], ptr noalias [[C:%.*]])
	; CHECK-VF2-NOT: call void @amd_vrd2_sincos(<2 x double> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK-VF4-NOT: call void @amd_vrd4_sincos(<4 x double> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK-VF8-NOT: call void @amd_vrd8_sincos(<8 x double> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK: ret void
	;
	entry:
	br label %for.body

	for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%gepa = getelementptr double, ptr %a, i64 %indvars.iv
	%num = load double, ptr %gepa, align 8
	%gepb = getelementptr double, ptr %b, i64 %indvars.iv
	%gepc = getelementptr double, ptr %c, i64 %indvars.iv
	call void @sincos(double %num, ptr %gepb, ptr %gepc)
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.cond.cleanup, label %for.body

	for.cond.cleanup:
	ret void
	}

	define void @sincos_f32(ptr noalias %a, ptr noalias %b, ptr noalias %c) {
	; CHECK-LABEL: define void @sincos_f32
	; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]], ptr noalias [[C:%.*]])
	; CHECK-VF4-NOT: call void @amd_vrs4_sincosf(<4 x float> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK-VF8-NOT: call void @amd_vrs8_sincosf(<8 x float> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK-VF16-NOT: call void @amd_vrs16_sincosf(<16 x float> [[WIDE_LOAD:%.]], ptr [[TMP5:%.]], ptr [[TMP6:%.*]])
	; CHECK: ret void
	;
	entry:
	br label %for.body

	for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%gepa = getelementptr float, ptr %a, i64 %indvars.iv
	%num = load float, ptr %gepa, align 8
	%gepb = getelementptr float, ptr %b, i64 %indvars.iv
	%gepc = getelementptr float, ptr %c, i64 %indvars.iv
	call void @sincosf(float %num, ptr %gepb, ptr %gepc)
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.cond.cleanup, label %for.body

	for.cond.cleanup:
	ret void
	}

	attributes #0 = { nounwind readnone }

	declare double @exp10(double) #0
	declare float @exp10f(float) #0
	declare double @llvm.exp10.f64(double) #0
	declare float @llvm.exp10.f32(float) #0
	declare void @sincos(double, ptr, ptr)
	declare void @sincosf(float, ptr, ptr)

DAG: Use sincos vector libcalls through RuntimeLibcalls #166984

DAG: Use sincos vector libcalls through RuntimeLibcalls #166984

Conversation

arsenm commented Nov 7, 2025

Uh oh!

arsenm commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Andarwinux commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

arsenm commented Nov 7, 2025 •

edited

Loading