Skip to content

Conversation

@arsenm
Copy link
Contributor

@arsenm arsenm commented Nov 7, 2025

Copy new process from sincospi.

Copy link
Contributor Author

arsenm commented Nov 7, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)

Changes

Copy new process from sincospi.


Full diff: https://github.com/llvm/llvm-project/pull/166984.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+6-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+4-10)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+18)
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 1c167af4b0478..a52ad41d0f1b3 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -334,7 +334,12 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
 
       break;
     case Intrinsic::sincos:
-      LC = RTLIB::getSINCOS(ScalarVT);
+      LC = RTLIB::getSINCOS(VT);
+      if (LC == RTLIB::UNKNOWN_LIBCALL)
+        LC = RTLIB::getSINCOS(ScalarVT);
+      else if (VT.isVector())
+        IsVectorCall = true;
+
       break;
     default:
       return std::nullopt;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index f5a54497c8a98..78d8ea0676dd7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -1268,10 +1268,12 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
       return;
 
     break;
-
+  case ISD::FSINCOS:
   case ISD::FSINCOSPI: {
     EVT VT = Node->getValueType(0);
-    RTLIB::Libcall LC = RTLIB::getSINCOSPI(VT);
+    RTLIB::Libcall LC = Node->getOpcode() == ISD::FSINCOS
+                            ? RTLIB::getSINCOS(VT)
+                            : RTLIB::getSINCOSPI(VT);
     if (LC != RTLIB::UNKNOWN_LIBCALL &&
         DAG.expandMultipleResultFPLibCall(LC, Node, Results, VT))
       return;
@@ -1280,14 +1282,6 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
     // scalarizing.
     break;
   }
-  case ISD::FSINCOS: {
-    // FIXME: Try to directly match vector case like fsincospi
-    EVT VT = Node->getValueType(0).getVectorElementType();
-    RTLIB::Libcall LC = RTLIB::getSINCOS(VT);
-    if (DAG.expandMultipleResultFPLibCall(LC, Node, Results, VT))
-      return;
-    break;
-  }
   case ISD::FMODF: {
     EVT VT = Node->getValueType(0).getVectorElementType();
     RTLIB::Libcall LC = RTLIB::getMODF(VT);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 814b4b57a0b9b..b4eb6c357e10e 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -425,6 +425,24 @@ RTLIB::Libcall RTLIB::getCOS(EVT RetVT) {
 }
 
 RTLIB::Libcall RTLIB::getSINCOS(EVT RetVT) {
+  // TODO: Tablegen should generate this function
+  if (RetVT.isVector()) {
+    if (!RetVT.isSimple())
+      return RTLIB::UNKNOWN_LIBCALL;
+    switch (RetVT.getSimpleVT().SimpleTy) {
+    case MVT::v4f32:
+      return RTLIB::SINCOS_V4F32;
+    case MVT::v2f64:
+      return RTLIB::SINCOS_V2F64;
+    case MVT::nxv4f32:
+      return RTLIB::SINCOS_NXV4F32;
+    case MVT::nxv2f64:
+      return RTLIB::SINCOS_NXV2F64;
+    default:
+      return RTLIB::UNKNOWN_LIBCALL;
+    }
+  }
+
   return getFPLibCall(RetVT, SINCOS_F32, SINCOS_F64, SINCOS_F80, SINCOS_F128,
                       SINCOS_PPCF128);
 }

return RTLIB::SINCOS_NXV4F32;
case MVT::nxv2f64:
return RTLIB::SINCOS_NXV2F64;
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this handle SINCOS_V8F64? AMDLIBM has this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but I'm only handling cases that are tested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Andarwinux please raise an issue if you can find AMDLIBM methods llvm doesn't currently handle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Andarwinux please raise an issue if you can find AMDLIBM methods llvm doesn't currently handle

No problems. It looks like AMDLIBM is still handling by TLI for now. I thought it would also switch to RuntimeLibcalls soon.

But veclib does indeed have issues on x86, see #164642.

@arsenm arsenm force-pushed the users/arsenm/runtime-libcalls/add-sincos-entries branch from 414d93b to 32550a9 Compare November 10, 2025 18:18
@arsenm arsenm force-pushed the users/arsenm/dag/use-sincos-runtime-libcalls branch from 17bfcbe to 404f6f0 Compare November 10, 2025 18:18
These are the tested set of libcalls used for codegen of llvm.sincos
and are needed to get the legalization to follow standard procedure.
@arsenm arsenm force-pushed the users/arsenm/dag/use-sincos-runtime-libcalls branch from 404f6f0 to c5ecd24 Compare November 10, 2025 19:22
@arsenm arsenm force-pushed the users/arsenm/runtime-libcalls/add-sincos-entries branch from 32550a9 to 768d8f2 Compare November 10, 2025 19:22
Base automatically changed from users/arsenm/runtime-libcalls/add-sincos-entries to main November 10, 2025 20:00
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm merged commit de68181 into main Nov 11, 2025
14 of 17 checks passed
@arsenm arsenm deleted the users/arsenm/dag/use-sincos-runtime-libcalls branch November 11, 2025 18:51
@Andarwinux
Copy link
Contributor

This PR seems have broken the ability to vectorize sincos with -fveclib, now it no longer vectorizes no matter which veclib is used.

see #167871

case VectorLibrary::SLEEFGNUABI:
for (RTLIB::LibcallImpl Impl :
{RTLIB::impl__ZGVnN4vl4l4_sincospif, RTLIB::impl__ZGVnN2vl8l8_sincospi,
{RTLIB::impl__ZGVnN2vl8l8_sincos, RTLIB::impl__ZGVnN4vl4l4_sincosf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this has somehow broken sincos vectorization from the Clang side. Now you have to do: -fveclib=ArmPL -mllvm --vector-library=ArmPL, just -O3 -fno-math-errno -fveclib=ArmPL results in the default sincos expansion rather than a vector call.

Sadly, this means all the LLVM IR tests pass, but it still is broken in Clang.

See: https://godbolt.org/z/63PKzsG48

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why TargetOptions should go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by #167996

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDLIBM sincos still not working

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want that to work, can you add tests for all of the functions it it? We only have any test coverage for sleef and armpl. If there's no test, it's not going to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want that to work, can you add tests for all of the functions it it? We only have any test coverage for sleef and armpl. If there's no test, it's not going to work

AMDLIBM did have some tests

define void @sincos_f64(ptr noalias %a, ptr noalias %b, ptr noalias %c) {
; CHECK-LABEL: define void @sincos_f64
; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]])
; CHECK-VF2-NOT: call void @amd_vrd2_sincos(<2 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK-VF4-NOT: call void @amd_vrd4_sincos(<4 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK-VF8-NOT: call void @amd_vrd8_sincos(<8 x double> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK: ret void
;
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%gepa = getelementptr double, ptr %a, i64 %indvars.iv
%num = load double, ptr %gepa, align 8
%gepb = getelementptr double, ptr %b, i64 %indvars.iv
%gepc = getelementptr double, ptr %c, i64 %indvars.iv
call void @sincos(double %num, ptr %gepb, ptr %gepc)
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.cond.cleanup, label %for.body
for.cond.cleanup:
ret void
}
define void @sincos_f32(ptr noalias %a, ptr noalias %b, ptr noalias %c) {
; CHECK-LABEL: define void @sincos_f32
; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]])
; CHECK-VF4-NOT: call void @amd_vrs4_sincosf(<4 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK-VF8-NOT: call void @amd_vrs8_sincosf(<8 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK-VF16-NOT: call void @amd_vrs16_sincosf(<16 x float> [[WIDE_LOAD:%.*]], ptr [[TMP5:%.*]], ptr [[TMP6:%.*]])
; CHECK: ret void
;
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%gepa = getelementptr float, ptr %a, i64 %indvars.iv
%num = load float, ptr %gepa, align 8
%gepb = getelementptr float, ptr %b, i64 %indvars.iv
%gepc = getelementptr float, ptr %c, i64 %indvars.iv
call void @sincosf(float %num, ptr %gepb, ptr %gepc)
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.cond.cleanup, label %for.body
for.cond.cleanup:
ret void
}
attributes #0 = { nounwind readnone }
declare double @exp10(double) #0
declare float @exp10f(float) #0
declare double @llvm.exp10.f64(double) #0
declare float @llvm.exp10.f32(float) #0
declare void @sincos(double, ptr, ptr)
declare void @sincosf(float, ptr, ptr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the standard codegen usage that's broken. The fact that LoopVectorize directly emits these calls is actually really weird (#162239 (comment)).

What's missing is codegen tests for using these calls for legalization of the intrinsics. e.g., the important bit of SLEEF and ArmPL coverage is here:

https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.sincos.ll
https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.sincospi.ll
https://github.com/llvm/llvm-project/blob/0fa6a67a4200ea1516f56e298df4a671af8a0642/llvm/test/CodeGen/AArch64/veclib-llvm.modf.ll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:codegen llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants