[AArch64] Add bitcasts for lowering saturating add/sub and shift intrinsics. #161840

Lukacma · 2025-10-03T12:56:25Z

This is followup patch to #157680 . In this patch, we are adding explicit bitcasts to floating-point type when lowering saturating add/sub and shift NEON scalar intrinsics using SelectionDAG, so they can be picked up by patterns added in first part of this series. To do that, we have to create new nodes for these intrinsics, which operate on floating-point types and wrapp them in bitcast nodes.

…dd/sub and shift intrinsics.

Lukacma · 2025-10-03T13:08:45Z

This is my first attempt at implementing this part and I would appreciate your feedback on this approach as there are couple of things I am uncertain about here.

Firstly, GlobalISel works fine with original intrinsics and doesn't require custom lowering of them, I decided to leave it be and instead duplicate patterns as you can see. (one which match new node for SDag and the other variant which matches original intrinsic). I am not sure if this is the correct approach or if I should lower to the new node also in GlobalISel.

Secondly, I am not sure if creating new nodes is necessary here as well. I was thinking about abusing intrinsics a bit and adding floating point variants of them. For example sqadd intrinsic is defined like this:

DefaultAttrsIntrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, LLVMMatchType<0>],
                [IntrNoMem]>;

I was thinking about generalizing it to allow it to be used with floating-point types, in which case I could lower into the floating-point variant and wouldn't need to create new ISD nodes. This should also resolve the issue with pattern duplication due to different lowerings, if I am not missing smth. I am not sure though whether such "abuse" is fine (I don't think it is, but it would make my life simpler).

Lukacma · 2025-10-13T09:27:09Z

Ping.

davemgreen

Sorry for the late response - I have been away. This sounds like it is probably fine given all the options - nothing will be perfect and the alternatives seem worse. This should allow the bitcasts to be optimized away if needed and new node names are fairly trivial to add at this point.

Is the list of commented out instructions all the ones that are needed? (And there are none with i8/i16 types?)

Lukacma · 2025-10-27T13:27:02Z

Thanks David for taking time to look at this. I am a bit confused by your comment though. Are you fine with generalizing the intrinsics to allow for floating point types or are you fine with having duplicate patterns, one for GlobalIsel second for SelectionDAG ?

As for your question, yes this should be all intrinsics. I think there are no i8/i16 specific ones as these types behave the same as i32/i64 so they don't need their own special intrinsics.

paulwalker-arm · 2025-11-06T12:54:22Z

are you fine with generalizing the intrinsics to allow for floating point types

The IR must remain type clean and so we should not allow floating point types for intrinsically integer operations.

llvmbot · 2025-11-06T14:08:41Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-selectiondag

Author: None (Lukacma)

Changes

This is followup patch to #157680 . In this patch, we are adding explicit bitcasts to floating-point type when lowering saturating add/sub and shift NEON scalar intrinsics using SelectionDAG, so they can be picked up by patterns added in first part of this series. To do that, we have to create new nodes for these intrinsics, which operate on floating-point types and wrapp them in bitcast nodes.

Patch is 39.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161840.diff

7 Files Affected:

(modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+3)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+41-4)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+57-10)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+33-15)
(added) llvm/test/CodeGen/AArch64/arm64-int-neon.ll (+238)
(modified) llvm/test/CodeGen/AArch64/arm64-vmul.ll (+100-84)
(modified) llvm/test/CodeGen/AArch64/arm64-vshift.ll (+12-22)

diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 07a858fd682fc..72edff4e92217 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -346,6 +346,9 @@ def SDTAtomicStore : SDTypeProfile<0, 2, [
 def SDTAtomicLoad : SDTypeProfile<1, 1, [
   SDTCisPtrTy<1>
 ]>;
+def SDTFPMulOp : SDTypeProfile<1, 2, [
+  SDTCisSameAs<1, 2>, SDTCisFP<0>, SDTCisFP<1>
+]>;
 
 class SDCallSeqStart<list<SDTypeConstraint> constraints> :
         SDTypeProfile<0, 2, constraints>;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d08f9b94227a2..affff377cd92c 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4486,6 +4486,23 @@ static SDValue lowerADDSUBO_CARRY(SDValue Op, SelectionDAG &DAG,
   return DAG.getMergeValues({Sum, OutFlag}, DL);
 }
 
+static SDValue lowerIntNeonIntrinsic(SDValue Op, unsigned Opcode,
+                                     SelectionDAG &DAG) {
+  SDLoc DL(Op);
+  auto getFloatVT = [](EVT VT) { return VT == MVT::i32 ? MVT::f32 : MVT::f64; };
+  auto bitcastToFloat = [&](SDValue Val) {
+    return DAG.getBitcast(getFloatVT(Val.getValueType()), Val);
+  };
+  SmallVector<SDValue, 2> NewOps;
+  NewOps.reserve(Op.getNumOperands() - 1);
+
+  for (unsigned I = 1, E = Op.getNumOperands(); I < E; ++I)
+    NewOps.push_back(bitcastToFloat(Op.getOperand(I)));
+  EVT OrigVT = Op.getValueType();
+  SDValue OpNode = DAG.getNode(Opcode, DL, getFloatVT(OrigVT), NewOps);
+  return DAG.getBitcast(OrigVT, OpNode);
+}
+
 static SDValue LowerXALUO(SDValue Op, SelectionDAG &DAG) {
   // Let legalize expand this if it isn't a legal type yet.
   if (!DAG.getTargetLoweringInfo().isTypeLegal(Op.getValueType()))
@@ -6328,26 +6345,46 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
                                      Op.getOperand(1).getValueType(),
                                      Op.getOperand(1), Op.getOperand(2)));
     return SDValue();
+  case Intrinsic::aarch64_neon_sqrshl:
+    if (Op.getValueType().isVector())
+      return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::SQRSHL, DAG);
+  case Intrinsic::aarch64_neon_sqshl:
+    if (Op.getValueType().isVector())
+      return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::SQSHL, DAG);
+  case Intrinsic::aarch64_neon_uqrshl:
+    if (Op.getValueType().isVector())
+      return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::UQRSHL, DAG);
+  case Intrinsic::aarch64_neon_uqshl:
+    if (Op.getValueType().isVector())
+      return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::UQSHL, DAG);
   case Intrinsic::aarch64_neon_sqadd:
     if (Op.getValueType().isVector())
       return DAG.getNode(ISD::SADDSAT, DL, Op.getValueType(), Op.getOperand(1),
                          Op.getOperand(2));
-    return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::SQADD, DAG);
+
   case Intrinsic::aarch64_neon_sqsub:
     if (Op.getValueType().isVector())
       return DAG.getNode(ISD::SSUBSAT, DL, Op.getValueType(), Op.getOperand(1),
                          Op.getOperand(2));
-    return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::SQSUB, DAG);
+
   case Intrinsic::aarch64_neon_uqadd:
     if (Op.getValueType().isVector())
       return DAG.getNode(ISD::UADDSAT, DL, Op.getValueType(), Op.getOperand(1),
                          Op.getOperand(2));
-    return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::UQADD, DAG);
   case Intrinsic::aarch64_neon_uqsub:
     if (Op.getValueType().isVector())
       return DAG.getNode(ISD::USUBSAT, DL, Op.getValueType(), Op.getOperand(1),
                          Op.getOperand(2));
-    return SDValue();
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::UQSUB, DAG);
+  case Intrinsic::aarch64_neon_sqdmulls_scalar:
+    return lowerIntNeonIntrinsic(Op, AArch64ISD::SQDMULL, DAG);
   case Intrinsic::aarch64_sve_whilelt:
     return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/true,
                                      /*IsEqual=*/false);
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 58a53af76e1b5..01d8775bcd189 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -7715,16 +7715,21 @@ multiclass SIMDThreeScalarD<bit U, bits<5> opc, string asm,
 }
 
 multiclass SIMDThreeScalarBHSD<bit U, bits<5> opc, string asm,
-                               SDPatternOperator OpNode, SDPatternOperator SatOp> {
+                               SDPatternOperator OpNode, SDPatternOperator G_OpNode, SDPatternOperator SatOp> {
   def v1i64  : BaseSIMDThreeScalar<U, 0b111, opc, FPR64, asm,
     [(set (v1i64 FPR64:$Rd), (SatOp (v1i64 FPR64:$Rn), (v1i64 FPR64:$Rm)))]>;
   def v1i32  : BaseSIMDThreeScalar<U, 0b101, opc, FPR32, asm, []>;
   def v1i16  : BaseSIMDThreeScalar<U, 0b011, opc, FPR16, asm, []>;
   def v1i8   : BaseSIMDThreeScalar<U, 0b001, opc, FPR8 , asm, []>;
 
-  def : Pat<(i64 (OpNode (i64 FPR64:$Rn), (i64 FPR64:$Rm))),
+  def : Pat<(i64 (G_OpNode (i64 FPR64:$Rn), (i64 FPR64:$Rm))),
             (!cast<Instruction>(NAME#"v1i64") FPR64:$Rn, FPR64:$Rm)>;
-  def : Pat<(i32 (OpNode (i32 FPR32:$Rn), (i32 FPR32:$Rm))),
+  def : Pat<(i32 (G_OpNode (i32 FPR32:$Rn), (i32 FPR32:$Rm))),
+            (!cast<Instruction>(NAME#"v1i32") FPR32:$Rn, FPR32:$Rm)>;
+
+  def : Pat<(f64 (OpNode FPR64:$Rn, FPR64:$Rm)),
+            (!cast<Instruction>(NAME#"v1i64") FPR64:$Rn, FPR64:$Rm)>;
+  def : Pat<(f32 (OpNode FPR32:$Rn, FPR32:$Rm)),
             (!cast<Instruction>(NAME#"v1i32") FPR32:$Rn, FPR32:$Rm)>;
 }
 
@@ -7799,10 +7804,14 @@ class BaseSIMDThreeScalarMixed<bit U, bits<2> size, bits<5> opcode,
   let Inst{10}    = 0;
   let Inst{9-5}   = Rn;
   let Inst{4-0}   = Rd;
+  
+  let mayLoad = 0;
+  let mayStore = 0;
+  let hasSideEffects = 0;
 }
 
-let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in
 multiclass SIMDThreeScalarMixedHS<bit U, bits<5> opc, string asm,
+                                  SDPatternOperator G_OpNode = null_frag,
                                   SDPatternOperator OpNode = null_frag> {
   def i16  : BaseSIMDThreeScalarMixed<U, 0b01, opc,
                                       (outs FPR32:$Rd),
@@ -7810,10 +7819,12 @@ multiclass SIMDThreeScalarMixedHS<bit U, bits<5> opc, string asm,
   def i32  : BaseSIMDThreeScalarMixed<U, 0b10, opc,
                                       (outs FPR64:$Rd),
                                       (ins FPR32:$Rn, FPR32:$Rm), asm, "",
-            [(set (i64 FPR64:$Rd), (OpNode (i32 FPR32:$Rn), (i32 FPR32:$Rm)))]>;
+            [(set (i64 FPR64:$Rd), (G_OpNode (i32 FPR32:$Rn), (i32 FPR32:$Rm)))]>;
+
+  def: Pat<(f64 (OpNode  FPR32:$Rn, FPR32:$Rm)),
+            (!cast<Instruction>(NAME#"i32") FPR32:$Rn, FPR32:$Rm)>;
 }
 
-let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in
 multiclass SIMDThreeScalarMixedTiedHS<bit U, bits<5> opc, string asm,
                                   SDPatternOperator OpNode = null_frag> {
   def i16  : BaseSIMDThreeScalarMixed<U, 0b01, opc,
@@ -9815,7 +9826,8 @@ multiclass SIMDIndexedLongSD<bit U, bits<4> opc, string asm,
 
 multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                                        SDPatternOperator VecAcc,
-                                       SDPatternOperator ScalAcc> {
+                                       SDPatternOperator ScalAcc,
+                                       SDPatternOperator G_ScalAcc> {
   def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
                                       V128, V64,
                                       V128_lo, VectorIndexH,
@@ -9884,7 +9896,7 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
     let Inst{20} = idx{0};
   }
 
-  def : Pat<(i32 (ScalAcc (i32 FPR32Op:$Rd),
+  def : Pat<(i32 (G_ScalAcc (i32 FPR32Op:$Rd),
                           (i32 (vector_extract
                                     (v4i32 (int_aarch64_neon_sqdmull
                                                 (v4i16 V64:$Rn),
@@ -9896,7 +9908,19 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                         (INSERT_SUBREG (IMPLICIT_DEF), V64:$Rm, dsub),
                         (i64 0))>;
 
-  def : Pat<(i32 (ScalAcc (i32 FPR32Op:$Rd),
+  def : Pat<(f32 (ScalAcc FPR32Op:$Rd,
+                    (bitconvert (i32 (vector_extract
+                                      (v4i32 (int_aarch64_neon_sqdmull
+                                                (v4i16 V64:$Rn),
+                                                (v4i16 V64:$Rm))),
+                                      (i64 0)))))),
+            (!cast<Instruction>(NAME # v1i32_indexed)
+                        FPR32Op:$Rd,
+                        (f16 (EXTRACT_SUBREG V64:$Rn, hsub)),
+                        (INSERT_SUBREG (IMPLICIT_DEF), V64:$Rm, dsub),
+                        (i64 0))>;
+
+  def : Pat<(i32 (G_ScalAcc (i32 FPR32Op:$Rd),
                           (i32 (vector_extract
                                     (v4i32 (int_aarch64_neon_sqdmull
                                                 (v4i16 V64:$Rn),
@@ -9909,11 +9933,24 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                         V128_lo:$Rm,
                         VectorIndexH:$idx)>;
 
+  def : Pat<(f32 (ScalAcc FPR32Op:$Rd,
+                          (bitconvert (i32 (vector_extract
+                                    (v4i32 (int_aarch64_neon_sqdmull
+                                                (v4i16 V64:$Rn),
+                                                (dup_v8i16 (v8i16 V128_lo:$Rm),
+                                                            VectorIndexH:$idx))),
+                                    (i64 0)))))),
+            (!cast<Instruction>(NAME # v1i32_indexed)
+                        FPR32Op:$Rd,
+                        (f16 (EXTRACT_SUBREG V64:$Rn, hsub)),
+                        V128_lo:$Rm,
+                        VectorIndexH:$idx)>;
+
   def v1i64_indexed : BaseSIMDIndexedTied<1, U, 1, 0b10, opc,
                                       FPR64Op, FPR32Op, V128, VectorIndexS,
                                       asm, ".s", "", "", ".s",
     [(set (i64 FPR64Op:$dst),
-          (ScalAcc (i64 FPR64Op:$Rd),
+          (G_ScalAcc (i64 FPR64Op:$Rd),
                    (i64 (int_aarch64_neon_sqdmulls_scalar
                             (i32 FPR32Op:$Rn),
                             (i32 (vector_extract (v4i32 V128:$Rm),
@@ -9923,6 +9960,16 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
     let Inst{11} = idx{1};
     let Inst{21} = idx{0};
   }
+
+  def : Pat<(f64 (ScalAcc FPR64Op:$Rd,
+                          (AArch64sqdmull FPR32Op:$Rn,
+                            (bitconvert (i32 (vector_extract (v4i32 V128:$Rm),
+                                                VectorIndexS:$idx)))))),
+            (!cast<Instruction>(NAME # v1i64_indexed)
+                        FPR64Op:$Rd,
+                        FPR32Op:$Rn,
+                        V128:$Rm,
+                        VectorIndexS:$idx)>;
 }
 
 multiclass SIMDVectorIndexedLongSD<bit U, bits<4> opc, string asm,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 2871a20e28b65..16f181e6a9bdf 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1012,6 +1012,16 @@ def AArch64fcvtnu_half : SDNode<"AArch64ISD::FCVTNU_HALF", SDTFPExtendOp>;
 def AArch64fcvtps_half : SDNode<"AArch64ISD::FCVTPS_HALF", SDTFPExtendOp>;
 def AArch64fcvtpu_half : SDNode<"AArch64ISD::FCVTPU_HALF", SDTFPExtendOp>;
 
+def AArch64sqadd:   SDNode<"AArch64ISD::SQADD",   SDTFPBinOp>;
+def AArch64sqrshl:  SDNode<"AArch64ISD::SQRSHL",  SDTFPBinOp>;
+def AArch64sqshl:   SDNode<"AArch64ISD::SQSHL",   SDTFPBinOp>;
+def AArch64sqsub:   SDNode<"AArch64ISD::SQSUB",   SDTFPBinOp>;
+def AArch64uqadd:   SDNode<"AArch64ISD::UQADD",   SDTFPBinOp>;
+def AArch64uqrshl:  SDNode<"AArch64ISD::UQRSHL",  SDTFPBinOp>;
+def AArch64uqshl:   SDNode<"AArch64ISD::UQSHL",   SDTFPBinOp>;
+def AArch64uqsub:   SDNode<"AArch64ISD::UQSUB",   SDTFPBinOp>;
+def AArch64sqdmull: SDNode<"AArch64ISD::SQDMULL", SDTFPMulOp>;
+
 //def Aarch64softf32tobf16v8: SDNode<"AArch64ISD::", SDTFPRoundOp>;
 
 // Vector immediate ops
@@ -6414,19 +6424,19 @@ defm FCMGT    : SIMDThreeScalarFPCmp<1, 1, 0b100, "fcmgt", AArch64fcmgt>;
 defm FMULX    : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx, HasNEONandIsStreamingSafe>;
 defm FRECPS   : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps, HasNEONandIsStreamingSafe>;
 defm FRSQRTS  : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts, HasNEONandIsStreamingSafe>;
-defm SQADD    : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", int_aarch64_neon_sqadd, saddsat>;
+defm SQADD    : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", AArch64sqadd, int_aarch64_neon_sqadd, saddsat>;
 defm SQDMULH  : SIMDThreeScalarHS<  0, 0b10110, "sqdmulh", int_aarch64_neon_sqdmulh>;
 defm SQRDMULH : SIMDThreeScalarHS<  1, 0b10110, "sqrdmulh", int_aarch64_neon_sqrdmulh>;
-defm SQRSHL   : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl", int_aarch64_neon_sqrshl, int_aarch64_neon_sqrshl>;
-defm SQSHL    : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", int_aarch64_neon_sqshl, int_aarch64_neon_sqshl>;
-defm SQSUB    : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", int_aarch64_neon_sqsub, ssubsat>;
+defm SQRSHL   : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl", AArch64sqrshl, int_aarch64_neon_sqrshl, int_aarch64_neon_sqrshl>;
+defm SQSHL    : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", AArch64sqshl, int_aarch64_neon_sqshl, int_aarch64_neon_sqshl>;
+defm SQSUB    : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", AArch64sqsub, int_aarch64_neon_sqsub, ssubsat>;
 defm SRSHL    : SIMDThreeScalarD<   0, 0b01010, "srshl", int_aarch64_neon_srshl>;
 defm SSHL     : SIMDThreeScalarD<   0, 0b01000, "sshl", int_aarch64_neon_sshl>;
 defm SUB      : SIMDThreeScalarD<   1, 0b10000, "sub", sub>;
-defm UQADD    : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", int_aarch64_neon_uqadd, uaddsat>;
-defm UQRSHL   : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl", int_aarch64_neon_uqrshl, int_aarch64_neon_uqrshl>;
-defm UQSHL    : SIMDThreeScalarBHSD<1, 0b01001, "uqshl", int_aarch64_neon_uqshl, int_aarch64_neon_uqshl>;
-defm UQSUB    : SIMDThreeScalarBHSD<1, 0b00101, "uqsub", int_aarch64_neon_uqsub, usubsat>;
+defm UQADD    : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", AArch64uqadd, int_aarch64_neon_uqadd, uaddsat>;
+defm UQRSHL   : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl", AArch64uqrshl, int_aarch64_neon_uqrshl, int_aarch64_neon_uqrshl>;
+defm UQSHL    : SIMDThreeScalarBHSD<1, 0b01001, "uqshl", AArch64uqshl, int_aarch64_neon_uqshl, int_aarch64_neon_uqshl>;
+defm UQSUB    : SIMDThreeScalarBHSD<1, 0b00101, "uqsub", AArch64uqsub, int_aarch64_neon_uqsub, usubsat>;
 defm URSHL    : SIMDThreeScalarD<   1, 0b01010, "urshl", int_aarch64_neon_urshl>;
 defm USHL     : SIMDThreeScalarD<   1, 0b01000, "ushl", int_aarch64_neon_ushl>;
 let Predicates = [HasRDM] in {
@@ -6477,17 +6487,25 @@ def : InstAlias<"faclt $dst, $src1, $src2",
 // Advanced SIMD three scalar instructions (mixed operands).
 //===----------------------------------------------------------------------===//
 defm SQDMULL  : SIMDThreeScalarMixedHS<0, 0b11010, "sqdmull",
-                                       int_aarch64_neon_sqdmulls_scalar>;
+                                       int_aarch64_neon_sqdmulls_scalar,
+                                       AArch64sqdmull>;
 defm SQDMLAL  : SIMDThreeScalarMixedTiedHS<0, 0b10010, "sqdmlal">;
 defm SQDMLSL  : SIMDThreeScalarMixedTiedHS<0, 0b10110, "sqdmlsl">;
 
 def : Pat<(i64 (int_aarch64_neon_sqadd (i64 FPR64:$Rd),
-                   (i64 (int_aarch64_neon_sqdmulls_scalar (i32 FPR32:$Rn),
-                                                        (i32 FPR32:$Rm))))),
+                   (int_aarch64_neon_sqdmulls_scalar FPR32:$Rn, FPR32:$Rm))),
+          (SQDMLALi32 FPR64:$Rd, FPR32:$Rn, FPR32:$Rm)>;
+
+def : Pat<(f64 (AArch64sqadd FPR64:$Rd,
+                (AArch64sqdmull FPR32:$Rn, FPR32:$Rm))),
           (SQDMLALi32 FPR64:$Rd, FPR32:$Rn, FPR32:$Rm)>;
+
 def : Pat<(i64 (int_aarch64_neon_sqsub (i64 FPR64:$Rd),
-                   (i64 (int_aarch64_neon_sqdmulls_scalar (i32 FPR32:$Rn),
-                                                        (i32 FPR32:$Rm))))),
+                   (int_aarch64_neon_sqdmulls_scalar FPR32:$Rn, FPR32:$Rm))),
+          (SQDMLSLi32 FPR64:$Rd, FPR32:$Rn, FPR32:$Rm)>;
+
+def : Pat<(f64 (AArch64sqsub FPR64:$Rd,
+               (AArch64sqdmull FPR32:$Rn, FPR32:$Rm))),
           (SQDMLSLi32 FPR64:$Rd, FPR32:$Rn, FPR32:$Rm)>;
 
 //===----------------------------------------------------------------------===//
@@ -8700,9 +8718,9 @@ defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
     TriOpFrag<(sub node:$LHS, (AArch64smull node:$MHS, node:$RHS))>>;
 defm SMULL : SIMDVectorIndexedLongSD<0, 0b1010, "smull", AArch64smull>;
 defm SQDMLAL : SIMDIndexedLongSQDMLXSDTied<0, 0b0011, "sqdmlal", saddsat,
-                                           int_aarch64_neon_sqadd>;
+                                           AArch64sqadd, int_aarch64_neon_sqadd>;
 defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl", ssubsat,
-                                           int_aarch64_neon_sqsub>;
+                                           AArch64sqsub, int_aarch64_neon_sqsub>;
 defm SQRDMLAH : SIMDIndexedSQRDMLxHSDTied<1, 0b1101, "sqrdmlah",
                                           int_aarch64_neon_sqrdmlah>;
 defm SQRDMLSH : SIMDIndexedSQRDMLxHSDTied<1, 0b1111, "sqrdmlsh",
diff --git a/llvm/test/CodeGen/AArch64/arm64-int-neon.ll b/llvm/test/CodeGen/AArch64/arm64-int-neon.ll
new file mode 100644
index 0000000000000..2c7a850f280bc
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/arm64-int-neon.ll
@@ -0,0 +1,238 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple aarch64-unknown-unknown -mattr=+fprcvt,+fullfp16 | FileCheck %s --check-prefixes=CHECK
+; RUN: llc < %s -mtriple aarch64-unknown-unknown -global-isel -global-isel-abort=2 -mattr=+fprcvt,+fullfp16 | FileCheck %s --check-prefixes=CHECK
+
+
+; CHECK-GI:       warning: Instruction selection used fallback path for test_sqrshl_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_sqrshl_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_sqshl_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_sqshl_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqrshl_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqrshl_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqshl_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqshl_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqadd_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqadd_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqsub_s32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_uqsub_s64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for test_sqdmulls_scalar
+
+define i32 @test_sqrshl_s32(float noundef %a){
+; CHECK-LABEL: test_sqrshl_s32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fcvtzs s0, s0
+; CHECK-NEXT:    sqrshl s0, s0, s0
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
+entry:
+  %cvt = tail call i32 @llvm.aarch64.neon.fcvtzs.i32.f32(float %a)
+  %res = tail call i32 @llvm.aarch64.neon.sqrshl.i32(i32 %cvt, i32 %cvt)
+  ret i32 %res
+}
+
+define i64 @test_sqrshl_s64(float noundef %a){
+; CHECK-LABEL: test_sqrshl_s64:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fcvtzs d0, s0
+; CHECK-NEXT:    sqrshl d0, d0, d0
+; CHECK-NEXT:    fmov x0, d0
+; CHECK-NEXT:    ret
+entry:
+  %cvt = tail call i64 @llvm.aarch64.neon.fcvtzs.i64.f32(float %a)
+  %res = tai...
[truncated]

Lukacma · 2025-11-06T14:09:55Z

Okay I have cleaned up the patch now and it is ready for review. After David's and Paul's comments I went with creating new nodes just for SDAG and duplicating patterns when necessary for GlobalISel.

Lukacma · 2025-11-14T10:53:48Z

Ping

arsenm · 2025-11-15T02:43:30Z

llvm/include/llvm/Target/TargetSelectionDAG.td

+def SDTFPMulOp : SDTypeProfile<1, 2, [
+  SDTCisSameAs<1, 2>, SDTCisFP<0>, SDTCisFP<1>
+]>;


Don't need this? This is named too specifically and we already have SDTFPBinOp which looks the same?

It's not the same unfortunately. SDTFPBinOp, requires all operands (1 result, 2 inputs) to have same FP type, which is not the case for sqdmull, as result has double the width of the inputs.

The name needs to be more targeted

This can be in AArch64InstrInfo.td if it is aarch64 specific. Can it reuse SDT_AArch64mull?

Unfortunately no. That one is for integers. I have reformatted it though and inlined this profile into node definition.

davemgreen · 2025-11-23T10:30:07Z

llvm/test/CodeGen/AArch64/arm64-int-neon.ll

@@ -0,0 +1,238 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple aarch64-unknown-unknown -mattr=+fprcvt,+fullfp16 | FileCheck %s --check-prefixes=CHECK
+; RUN: llc < %s -mtriple aarch64-unknown-unknown -global-isel -global-isel-abort=2 -mattr=+fprcvt,+fullfp16 | FileCheck %s --check-prefixes=CHECK


Does this need 2>&1 and --check-prefixes=CHECK,CHECK-GI?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

github-actions · 2025-11-28T13:47:17Z

✅ With the latest revision this PR passed the C/C++ code formatter.

davemgreen

Thanks, I think this LGTM.

@JoshdRod was looking at some of the GISel intrinsics recently. Hopefully that will lower the amount of fallbacks.

[AArch64][GlobalISel] Add explicit bitcast when lowering saturating a…

a29ca01

…dd/sub and shift intrinsics.

Lukacma requested review from davemgreen and paulwalker-arm October 3, 2025 12:56

CarolineConcatto requested a review from momchil-velikov October 17, 2025 14:54

davemgreen reviewed Oct 20, 2025

View reviewed changes

Merge branch 'llvm:main' into neon-lowering

029a3f4

Clean up the patch and add sqmulls_scalar lowering

d5e26ce

Lukacma marked this pull request as ready for review November 6, 2025 14:08

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Nov 6, 2025

Lukacma requested a review from CarolineConcatto November 6, 2025 14:10

Lukacma added 2 commits November 6, 2025 14:55

Remove unnecessary globalISel patterns for unsupported nodes

91d4102

Remove unnecessary globalISel patterns for unsupported nodes (again)

ebe0146

arsenm reviewed Nov 15, 2025

View reviewed changes

davemgreen reviewed Nov 23, 2025

View reviewed changes

added assert and fix test

fa9d828

fix format

a3c2773

davemgreen approved these changes Dec 1, 2025

View reviewed changes

Lukacma added 2 commits December 1, 2025 13:42

Merge branch 'main' into neon-lowering

4a0f684

Reformat SDTypeProfile

acb2ca5

Lukacma merged commit 00c8e61 into llvm:main Dec 3, 2025
10 checks passed

[AArch64] Add bitcasts for lowering saturating add/sub and shift intrinsics. #161840

[AArch64] Add bitcasts for lowering saturating add/sub and shift intrinsics. #161840

Uh oh!

Conversation

Lukacma commented Oct 3, 2025

Uh oh!

Lukacma commented Oct 3, 2025

Uh oh!

Lukacma commented Oct 13, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Lukacma commented Oct 27, 2025

Uh oh!

paulwalker-arm commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lukacma commented Nov 6, 2025

Uh oh!

Lukacma commented Nov 14, 2025

Uh oh!

arsenm Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Lukacma Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Lukacma Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Lukacma Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

paulwalker-arm commented Nov 6, 2025 •

edited

Loading

llvmbot commented Nov 6, 2025 •

edited

Loading

github-actions bot commented Nov 28, 2025 •

edited

Loading