[AArch64] Lower aarch64.neon.fcvtzs.i16.f16 to FP_TO_SINT_SAT #154344

mrkajetanp · 2025-08-19T14:16:47Z

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion, including correct overflow behaviour. The semantics of the operation match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering of aarch64.neon.fcvtzs.i16.f16 by making use of it.

Part of a solution to #154343.

llvmbot · 2025-08-19T14:17:18Z

@llvm/pr-subscribers-backend-aarch64

Author: Kajetan Puchalski (mrkajetanp)

Changes

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion, including correct overflow behaviour. The semantics of the operation match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering of aarch64.neon.fcvtzs.i16.f16 by making use of it.

Part of a solution to #154343.

Full diff: https://github.com/llvm/llvm-project/pull/154344.diff

3 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+13)
(modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+17)
(added) llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll (+20)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index aefbbe2534be2..9342fbbc0e011 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1289,6 +1289,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);
       setOperationAction(ISD::SINT_TO_FP, MVT::v8i16, Custom);
       setOperationAction(ISD::UINT_TO_FP, MVT::v8i16, Custom);
+
+      // f16 -> i16 conversion intrinsics need custom lowering
+      setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);
     } else {
       // when AArch64 doesn't have fullfp16 support, promote the input
       // to i32 first.
@@ -28238,6 +28241,16 @@ void AArch64TargetLowering::ReplaceNodeResults(
       Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
       return;
     }
+    case Intrinsic::aarch64_neon_fcvtzs: {
+      if (VT.getScalarType() != MVT::i16)
+        return;
+
+      SDLoc DL(N);
+      auto CVT = DAG.getNode(ISD::FP_TO_SINT_SAT, DL, VT,
+          N->getOperand(1), DAG.getValueType(MVT::i16));
+      Results.push_back(CVT);
+      return;
+    }
     }
   }
   case ISD::READ_REGISTER: {
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index ee34a85a5b507..16fe6ec176e53 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2278,6 +2278,23 @@ bool AArch64InstructionSelector::preISelLower(MachineInstr &I) {
     }
     return false;
   }
+  case TargetOpcode::G_INTRINSIC: {
+    unsigned IntrinID = cast<GIntrinsic>(I).getIntrinsicID();
+    switch (IntrinID) {
+      default:
+        break;
+      case Intrinsic::aarch64_neon_fcvtzs: {
+        const LLT DstTy = MRI.getType(I.getOperand(0).getReg());
+        if (DstTy != LLT::scalar(16))
+          return false;
+        // Remove the no longer needed intrinsic ID operand
+        I.removeOperand(1);
+        I.setDesc(TII.get(TargetOpcode::G_FPTOSI_SAT));
+        return true;
+      }
+    }
+    return false;
+  }
   default:
     return false;
   }
diff --git a/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll b/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll
new file mode 100644
index 0000000000000..955ee2e4b319f
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll
@@ -0,0 +1,20 @@
+; Test fp16 -> s16 conversion intrinsics which require special handling to ensure correct behaviour.
+; RUN: llc < %s -mtriple=aarch64 -global-isel=0 -mattr=+v8.2a,+fullfp16  | FileCheck %s --check-prefixes=CHECK-SD
+
+declare i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half)
+
+define i16 @fcvtzs_intrinsic_i16(half %a) {
+; CHECK-SD-LABEL: fcvtzs_intrinsic_i16:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    fcvtzs w8, h0
+; CHECK-SD-NEXT:    mov w9, #32767
+; CHECK-SD-NEXT:    cmp w8, w9
+; CHECK-SD-NEXT:    csel w8, w8, w9, lt
+; CHECK-SD-NEXT:    mov w9, #-32768
+; CHECK-SD-NEXT:    cmn w8, #8, lsl #12
+; CHECK-SD-NEXT:    csel
+; CHECK-SD-NEXT:    ret
+entry:
+  %fcvt = tail call i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half %a)
+  ret i16 %fcvt
+}

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion, including correct overflow behaviour. The semantics of the operation match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering of aarch64.neon.fcvtzs.i16.f16 by making use of it. Signed-off-by: Kajetan Puchalski <[email protected]>

mrkajetanp · 2025-08-19T14:33:00Z

This is a conversation starter more than the intended solution, as it would only work as a fix for 1 of the 5 broken intrinsics.
I'm looking for some opinions on the direction for how to model what's needed.
Currently, the 5 intrinsics in question are just generated from tablegen using one-instruction OPs.

@llvm.aarch64.neon.fcvtzs.i32.f16
@llvm.aarch64.neon.fcvtas.i32.f16
@llvm.aarch64.neon.fcvtms.i32.f16
@llvm.aarch64.neon.fcvtns.i32.f16
@llvm.aarch64.neon.fcvtps.i32.f16

For the .i16.f16 formats, correct behaviour requires generating at least two instructions, e.g.:

fcvtzs  h31, h31
smov    w0, v31.h[0]

FP_TO_SINT_SAT gives the correct result when used for the first intrinsic, but it does so by using 7 instructions instead of 2.

What would the best way to go about this be? Adding a pseudo-op in AArch64InstrInfo.td to handle these conversions then expanding that pseudo into the 2 instructions, or something else?

efriedma-quic · 2025-08-19T16:12:18Z

Adding a pseudo-op in AArch64InstrInfo.td to handle these conversions then expanding that pseudo into the 2 instructions

That's one way.

To make what's happening a bit more transparent to other optimizations, probably better to make a SelectionDAG node AArch64ISD::FCVTZS_HALF that returns an f32 containing the integer bit-pattern, BITCAST that to i32, the truncate to i16.

mrkajetanp · 2025-08-28T23:55:39Z

Superseded by #155851

mrkajetanp requested review from davemgreen, sdesmalen-arm and tblah August 19, 2025 14:16

llvmbot added the backend:AArch64 label Aug 19, 2025

mrkajetanp marked this pull request as draft August 19, 2025 14:17

mrkajetanp force-pushed the vcvth-f16-overflow branch from 0c9e5f8 to ac2318a Compare August 19, 2025 14:18

efriedma-quic mentioned this pull request Aug 26, 2025

[AArch64][SDAG] Lower f16->s16 FP_TO_INT_SAT to *v1f16 #154822

Merged

mrkajetanp closed this Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Lower aarch64.neon.fcvtzs.i16.f16 to FP_TO_SINT_SAT #154344

[AArch64] Lower aarch64.neon.fcvtzs.i16.f16 to FP_TO_SINT_SAT #154344

Uh oh!

mrkajetanp commented Aug 19, 2025

Uh oh!

llvmbot commented Aug 19, 2025

Uh oh!

mrkajetanp commented Aug 19, 2025

Uh oh!

efriedma-quic commented Aug 19, 2025

Uh oh!

mrkajetanp commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AArch64] Lower aarch64.neon.fcvtzs.i16.f16 to FP_TO_SINT_SAT #154344

[AArch64] Lower aarch64.neon.fcvtzs.i16.f16 to FP_TO_SINT_SAT #154344

Uh oh!

Conversation

mrkajetanp commented Aug 19, 2025

Uh oh!

llvmbot commented Aug 19, 2025

Uh oh!

mrkajetanp commented Aug 19, 2025

Uh oh!

efriedma-quic commented Aug 19, 2025

Uh oh!

mrkajetanp commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants