Skip to content

Conversation

@mrkajetanp
Copy link
Contributor

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion, including correct overflow behaviour. The semantics of the operation match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering of aarch64.neon.fcvtzs.i16.f16 by making use of it.

Part of a solution to #154343.

@mrkajetanp mrkajetanp marked this pull request as draft August 19, 2025 14:17
@llvmbot
Copy link
Member

llvmbot commented Aug 19, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Kajetan Puchalski (mrkajetanp)

Changes

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion, including correct overflow behaviour. The semantics of the operation match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering of aarch64.neon.fcvtzs.i16.f16 by making use of it.

Part of a solution to #154343.


Full diff: https://github.com/llvm/llvm-project/pull/154344.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+13)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+17)
  • (added) llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll (+20)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index aefbbe2534be2..9342fbbc0e011 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1289,6 +1289,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);
       setOperationAction(ISD::SINT_TO_FP, MVT::v8i16, Custom);
       setOperationAction(ISD::UINT_TO_FP, MVT::v8i16, Custom);
+
+      // f16 -> i16 conversion intrinsics need custom lowering
+      setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);
     } else {
       // when AArch64 doesn't have fullfp16 support, promote the input
       // to i32 first.
@@ -28238,6 +28241,16 @@ void AArch64TargetLowering::ReplaceNodeResults(
       Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
       return;
     }
+    case Intrinsic::aarch64_neon_fcvtzs: {
+      if (VT.getScalarType() != MVT::i16)
+        return;
+
+      SDLoc DL(N);
+      auto CVT = DAG.getNode(ISD::FP_TO_SINT_SAT, DL, VT,
+          N->getOperand(1), DAG.getValueType(MVT::i16));
+      Results.push_back(CVT);
+      return;
+    }
     }
   }
   case ISD::READ_REGISTER: {
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index ee34a85a5b507..16fe6ec176e53 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2278,6 +2278,23 @@ bool AArch64InstructionSelector::preISelLower(MachineInstr &I) {
     }
     return false;
   }
+  case TargetOpcode::G_INTRINSIC: {
+    unsigned IntrinID = cast<GIntrinsic>(I).getIntrinsicID();
+    switch (IntrinID) {
+      default:
+        break;
+      case Intrinsic::aarch64_neon_fcvtzs: {
+        const LLT DstTy = MRI.getType(I.getOperand(0).getReg());
+        if (DstTy != LLT::scalar(16))
+          return false;
+        // Remove the no longer needed intrinsic ID operand
+        I.removeOperand(1);
+        I.setDesc(TII.get(TargetOpcode::G_FPTOSI_SAT));
+        return true;
+      }
+    }
+    return false;
+  }
   default:
     return false;
   }
diff --git a/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll b/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll
new file mode 100644
index 0000000000000..955ee2e4b319f
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fp16_s16_intrinsic_scalar.ll
@@ -0,0 +1,20 @@
+; Test fp16 -> s16 conversion intrinsics which require special handling to ensure correct behaviour.
+; RUN: llc < %s -mtriple=aarch64 -global-isel=0 -mattr=+v8.2a,+fullfp16  | FileCheck %s --check-prefixes=CHECK-SD
+
+declare i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half)
+
+define i16 @fcvtzs_intrinsic_i16(half %a) {
+; CHECK-SD-LABEL: fcvtzs_intrinsic_i16:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    fcvtzs w8, h0
+; CHECK-SD-NEXT:    mov w9, #32767
+; CHECK-SD-NEXT:    cmp w8, w9
+; CHECK-SD-NEXT:    csel w8, w8, w9, lt
+; CHECK-SD-NEXT:    mov w9, #-32768
+; CHECK-SD-NEXT:    cmn w8, #8, lsl #12
+; CHECK-SD-NEXT:    csel
+; CHECK-SD-NEXT:    ret
+entry:
+  %fcvt = tail call i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half %a)
+  ret i16 %fcvt
+}

FP_TO_SINT_SAT is capable of correctly handling a f16 -> s16 conversion,
including correct overflow behaviour. The semantics of the operation
match those of the vcvth_s16_f16 NEON intrinsic. Enable correct lowering
of aarch64.neon.fcvtzs.i16.f16 by making use of it.

Signed-off-by: Kajetan Puchalski <[email protected]>
@mrkajetanp
Copy link
Contributor Author

This is a conversation starter more than the intended solution, as it would only work as a fix for 1 of the 5 broken intrinsics.
I'm looking for some opinions on the direction for how to model what's needed.
Currently, the 5 intrinsics in question are just generated from tablegen using one-instruction OPs.

@llvm.aarch64.neon.fcvtzs.i32.f16
@llvm.aarch64.neon.fcvtas.i32.f16
@llvm.aarch64.neon.fcvtms.i32.f16
@llvm.aarch64.neon.fcvtns.i32.f16
@llvm.aarch64.neon.fcvtps.i32.f16

For the .i16.f16 formats, correct behaviour requires generating at least two instructions, e.g.:

fcvtzs  h31, h31
smov    w0, v31.h[0]

FP_TO_SINT_SAT gives the correct result when used for the first intrinsic, but it does so by using 7 instructions instead of 2.

What would the best way to go about this be? Adding a pseudo-op in AArch64InstrInfo.td to handle these conversions then expanding that pseudo into the 2 instructions, or something else?

@efriedma-quic
Copy link
Collaborator

Adding a pseudo-op in AArch64InstrInfo.td to handle these conversions then expanding that pseudo into the 2 instructions

That's one way.

To make what's happening a bit more transparent to other optimizations, probably better to make a SelectionDAG node AArch64ISD::FCVTZS_HALF that returns an f32 containing the integer bit-pattern, BITCAST that to i32, the truncate to i16.

@mrkajetanp
Copy link
Contributor Author

Superseded by #155851

@mrkajetanp mrkajetanp closed this Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants