Skip to content

Conversation

@davemgreen
Copy link
Collaborator

@davemgreen davemgreen commented Nov 22, 2025

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set exception flags in the same ways as scalar fadd/fsub/fmul, but will not honor flush-to-zero (for f32 they always flush, for f16 they follows the fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32 intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul without changing the expected behaviour under strict-fp. This patch introduces a set in intrinsics that we can use instead, going from vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

If this is acceptable, the other intrinsics I can add will be:
fma
fptoi, itofp
trunc/round/ceil/etc
fcmp
minnum/maxnum
The current implementations assumes that the standard variant of a strictfp alternative will be a IRBuilder, this can be changed to take a IRBuilder or IRInt. There are also Neon intrinsics that AFAIU behave the same way.

@llvmbot llvmbot added backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:ir labels Nov 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-llvm-ir

Author: David Green (davemgreen)

Changes

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set exception flags in the same ways as scalar fadd/fsub/fmul, but will not honor flush-to-zero (for f32 they always flush, for f16 they follows the fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32 intrinsics to llvm.constrained.fadd/fsub/fmul, and the fadd/fsub/fmul without changing the expected behaviour under strict-fp. This patch introduces a set in intrinsics that we can use instead, going from vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

If this is acceptable, the other intrinsics I can add will be:
fma
fptoi, itofp
trunc/round/ceil/etc
fcmp
minnum/maxnum
The current implementations assumes that the standard variant of a strictfp alternative will be a IRBuilder, this can be changed to take a IRBuilder or IRInt. There are also Neon intrinsics that AFAIU behave the same way.


Patch is 93.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169156.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/arm_mve_defs.td (+18-3)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vaddq.c (+146-68)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vmulq.c (+266-124)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vsubq.c (+146-68)
  • (modified) clang/utils/TableGen/MveEmitter.cpp (+36-3)
  • (modified) llvm/include/llvm/IR/IntrinsicsARM.td (+12)
  • (modified) llvm/lib/Target/ARM/ARMInstrMVE.td (+22-11)
  • (added) llvm/test/CodeGen/Thumb2/mve-intrinsics/strict-intrinsics.ll (+143)
  • (modified) llvm/test/CodeGen/Thumb2/mve-intrinsics/vabdq.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-ext.ll (+6-6)
diff --git a/clang/include/clang/Basic/arm_mve_defs.td b/clang/include/clang/Basic/arm_mve_defs.td
index c1562a0c1f04c..eeca9153dd742 100644
--- a/clang/include/clang/Basic/arm_mve_defs.td
+++ b/clang/include/clang/Basic/arm_mve_defs.td
@@ -74,9 +74,9 @@ def immshr: CGHelperFn<"MVEImmediateShr"> {
   let special_params = [IRBuilderIntParam<1, "unsigned">,
                         IRBuilderIntParam<2, "bool">];
 }
-def fadd: IRBuilder<"CreateFAdd">;
-def fmul: IRBuilder<"CreateFMul">;
-def fsub: IRBuilder<"CreateFSub">;
+def fadd_node: IRBuilder<"CreateFAdd">;
+def fmul_node: IRBuilder<"CreateFMul">;
+def fsub_node: IRBuilder<"CreateFSub">;
 def load: IRBuilder<"CreateLoad"> {
   let special_params = [IRBuilderAddrParam<0>];
 }
@@ -212,6 +212,13 @@ def unsignedflag;
 // constant giving its size in bits.
 def bitsize;
 
+// strictFPAlt allows a node to have different code generation under strict-fp.
+// TODO: The standard node can be IRBuilderBase or IRIntBase.
+class strictFPAlt<IRBuilderBase standard_, IRIntBase strictfp_> {
+  IRBuilderBase standard = standard_;
+  IRIntBase strictfp = strictfp_;
+}
+
 // If you put CustomCodegen<"foo"> in an intrinsic's codegen field, it
 // indicates that the IR generation for that intrinsic is done by handwritten
 // C++ and not autogenerated at all. The effect in the MVE builtin codegen
@@ -573,6 +580,14 @@ multiclass IntrinsicMXNameOverride<Type rettype, dag arguments, dag cg,
   }
 }
 
+// StrictFP nodes that choose between standard fadd and llvm.arm.mve.fadd nodes
+// depending on whether we are using strict-fp.
+def fadd: strictFPAlt<fadd_node,
+                      IRInt<"vadd", [Vector]>>;
+def fsub: strictFPAlt<fsub_node,
+                      IRInt<"vsub", [Vector]>>;
+def fmul: strictFPAlt<fmul_node,
+                      IRInt<"vmul", [Vector]>>;
 
 // -----------------------------------------------------------------------------
 // Convenience lists of parameter types. 'T' is just a container record, so you
diff --git a/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c b/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
index 238cb4056d4f1..d24834951b385 100644
--- a/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
+++ b/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
@@ -1,6 +1,8 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s
-// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-NONSTRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-NONSTRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -frounding-math -fexperimental-strict-floating-point -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -frounding-math -fexperimental-strict-floating-point -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -20,10 +22,15 @@ uint32x4_t test_vaddq_u32(uint32x4_t a, uint32x4_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[B:%.*]]
-// CHECK-NEXT:    ret <8 x half> [[TMP0]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[B:%.*]]
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP0]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vadd.v8f16(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]]) #[[ATTR2:[0-9]+]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP0]]
 //
 float16x8_t test_vaddq_f16(float16x8_t a, float16x8_t b)
 {
@@ -34,12 +41,19 @@ float16x8_t test_vaddq_f16(float16x8_t a, float16x8_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_s8(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <16 x i8> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_s8(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <16 x i8> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_s8(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <16 x i8> [[TMP2]]
 //
 int8x16_t test_vaddq_m_s8(int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -50,12 +64,19 @@ int8x16_t test_vaddq_m_s8(int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_f32(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <4 x float> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_f32(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <4 x float> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_f32(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <4 x float> [[TMP2]]
 //
 float32x4_t test_vaddq_m_f32(float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -66,12 +87,19 @@ float32x4_t test_vaddq_m_f32(float32x4_t inactive, float32x4_t a, float32x4_t b,
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_u16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef)
-// CHECK-NEXT:    ret <8 x i16> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_u16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x i16> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_u16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x i16> [[TMP2]]
 //
 uint16x8_t test_vaddq_x_u16(uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
@@ -82,12 +110,19 @@ uint16x8_t test_vaddq_x_u16(uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef)
-// CHECK-NEXT:    ret <8 x half> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP2]]
 //
 float16x8_t test_vaddq_x_f16(float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -114,12 +149,19 @@ uint32x4_t test_vaddq_n_u32(uint32x4_t a, uint32_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_n_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[DOTSPLAT]]
-// CHECK-NEXT:    ret <8 x half> [[TMP0]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_n_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[DOTSPLAT]]
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP0]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_n_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vadd.v8f16(<8 x half> [[A:%.*]], <8 x half> [[DOTSPLAT]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP0]]
 //
 float16x8_t test_vaddq_n_f16(float16x8_t a, float16_t b)
 {
@@ -130,14 +172,23 @@ float16x8_t test_vaddq_n_f16(float16x8_t a, float16_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_n_s8(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <16 x i8> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_n_s8(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <16 x i8> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_n_s8(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <16 x i8> [[TMP2]]
 //
 int8x16_t test_vaddq_m_n_s8(int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
@@ -148,14 +199,23 @@ int8x16_t test_vaddq_m_n_s8(int8x16_t inactive, int8x16_t a, int8_t b, mve_pred1
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_n_f32(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <4 x float> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_n_f32(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <4 x float> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_n_f32(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <4 x float> [[TMP2]]
 //
 float32x4_t test_vaddq_m_n_f32(float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
@@ -166,14 +226,23 @@ float32x4_t test_vaddq_m_n_f32(float32x4_t inactive, float32x4_t a, float32_t b,
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_n_u16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef)
-// CHECK-NEXT:    ret <8 x i16> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_n_u16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x i16> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_n_u16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x i16> [[TMP2]]
 //
 uint16x8_t test_vaddq_x_n_u16(uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
@@ -184,14 +253,23 @@ uint16x8_t test_vaddq_x_n_u16(uint16x8_t a, uint16_t b, mve_pred16_t p)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_n_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[DOTSPLAT]], <...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-backend-arm

Author: David Green (davemgreen)

Changes

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set exception flags in the same ways as scalar fadd/fsub/fmul, but will not honor flush-to-zero (for f32 they always flush, for f16 they follows the fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32 intrinsics to llvm.constrained.fadd/fsub/fmul, and the fadd/fsub/fmul without changing the expected behaviour under strict-fp. This patch introduces a set in intrinsics that we can use instead, going from vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

If this is acceptable, the other intrinsics I can add will be:
fma
fptoi, itofp
trunc/round/ceil/etc
fcmp
minnum/maxnum
The current implementations assumes that the standard variant of a strictfp alternative will be a IRBuilder, this can be changed to take a IRBuilder or IRInt. There are also Neon intrinsics that AFAIU behave the same way.


Patch is 93.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169156.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/arm_mve_defs.td (+18-3)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vaddq.c (+146-68)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vmulq.c (+266-124)
  • (modified) clang/test/CodeGen/arm-mve-intrinsics/vsubq.c (+146-68)
  • (modified) clang/utils/TableGen/MveEmitter.cpp (+36-3)
  • (modified) llvm/include/llvm/IR/IntrinsicsARM.td (+12)
  • (modified) llvm/lib/Target/ARM/ARMInstrMVE.td (+22-11)
  • (added) llvm/test/CodeGen/Thumb2/mve-intrinsics/strict-intrinsics.ll (+143)
  • (modified) llvm/test/CodeGen/Thumb2/mve-intrinsics/vabdq.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-ext.ll (+6-6)
diff --git a/clang/include/clang/Basic/arm_mve_defs.td b/clang/include/clang/Basic/arm_mve_defs.td
index c1562a0c1f04c..eeca9153dd742 100644
--- a/clang/include/clang/Basic/arm_mve_defs.td
+++ b/clang/include/clang/Basic/arm_mve_defs.td
@@ -74,9 +74,9 @@ def immshr: CGHelperFn<"MVEImmediateShr"> {
   let special_params = [IRBuilderIntParam<1, "unsigned">,
                         IRBuilderIntParam<2, "bool">];
 }
-def fadd: IRBuilder<"CreateFAdd">;
-def fmul: IRBuilder<"CreateFMul">;
-def fsub: IRBuilder<"CreateFSub">;
+def fadd_node: IRBuilder<"CreateFAdd">;
+def fmul_node: IRBuilder<"CreateFMul">;
+def fsub_node: IRBuilder<"CreateFSub">;
 def load: IRBuilder<"CreateLoad"> {
   let special_params = [IRBuilderAddrParam<0>];
 }
@@ -212,6 +212,13 @@ def unsignedflag;
 // constant giving its size in bits.
 def bitsize;
 
+// strictFPAlt allows a node to have different code generation under strict-fp.
+// TODO: The standard node can be IRBuilderBase or IRIntBase.
+class strictFPAlt<IRBuilderBase standard_, IRIntBase strictfp_> {
+  IRBuilderBase standard = standard_;
+  IRIntBase strictfp = strictfp_;
+}
+
 // If you put CustomCodegen<"foo"> in an intrinsic's codegen field, it
 // indicates that the IR generation for that intrinsic is done by handwritten
 // C++ and not autogenerated at all. The effect in the MVE builtin codegen
@@ -573,6 +580,14 @@ multiclass IntrinsicMXNameOverride<Type rettype, dag arguments, dag cg,
   }
 }
 
+// StrictFP nodes that choose between standard fadd and llvm.arm.mve.fadd nodes
+// depending on whether we are using strict-fp.
+def fadd: strictFPAlt<fadd_node,
+                      IRInt<"vadd", [Vector]>>;
+def fsub: strictFPAlt<fsub_node,
+                      IRInt<"vsub", [Vector]>>;
+def fmul: strictFPAlt<fmul_node,
+                      IRInt<"vmul", [Vector]>>;
 
 // -----------------------------------------------------------------------------
 // Convenience lists of parameter types. 'T' is just a container record, so you
diff --git a/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c b/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
index 238cb4056d4f1..d24834951b385 100644
--- a/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
+++ b/clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
@@ -1,6 +1,8 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s
-// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-NONSTRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-NONSTRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -frounding-math -fexperimental-strict-floating-point -disable-O0-optnone -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT
+// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -frounding-math -fexperimental-strict-floating-point -disable-O0-optnone -DPOLYMORPHIC -emit-llvm -o - %s | opt -S -passes=sroa | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -20,10 +22,15 @@ uint32x4_t test_vaddq_u32(uint32x4_t a, uint32x4_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[B:%.*]]
-// CHECK-NEXT:    ret <8 x half> [[TMP0]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[B:%.*]]
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP0]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vadd.v8f16(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]]) #[[ATTR2:[0-9]+]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP0]]
 //
 float16x8_t test_vaddq_f16(float16x8_t a, float16x8_t b)
 {
@@ -34,12 +41,19 @@ float16x8_t test_vaddq_f16(float16x8_t a, float16x8_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_s8(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <16 x i8> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_s8(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <16 x i8> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_s8(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <16 x i8> [[TMP2]]
 //
 int8x16_t test_vaddq_m_s8(int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -50,12 +64,19 @@ int8x16_t test_vaddq_m_s8(int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_f32(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <4 x float> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_f32(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <4 x float> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_f32(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[B:%.*]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <4 x float> [[TMP2]]
 //
 float32x4_t test_vaddq_m_f32(float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -66,12 +87,19 @@ float32x4_t test_vaddq_m_f32(float32x4_t inactive, float32x4_t a, float32x4_t b,
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_u16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef)
-// CHECK-NEXT:    ret <8 x i16> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_u16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x i16> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_u16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[B:%.*]], <8 x i1> [[TMP1]], <8 x i16> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x i16> [[TMP2]]
 //
 uint16x8_t test_vaddq_x_u16(uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
@@ -82,12 +110,19 @@ uint16x8_t test_vaddq_x_u16(uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef)
-// CHECK-NEXT:    ret <8 x half> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[B:%.*]], <8 x i1> [[TMP1]], <8 x half> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP2]]
 //
 float16x8_t test_vaddq_x_f16(float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -114,12 +149,19 @@ uint32x4_t test_vaddq_n_u32(uint32x4_t a, uint32_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_n_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[DOTSPLAT]]
-// CHECK-NEXT:    ret <8 x half> [[TMP0]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_n_f16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = fadd <8 x half> [[A:%.*]], [[DOTSPLAT]]
+// CHECK-NONSTRICT-NEXT:    ret <8 x half> [[TMP0]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_n_f16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vadd.v8f16(<8 x half> [[A:%.*]], <8 x half> [[DOTSPLAT]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x half> [[TMP0]]
 //
 float16x8_t test_vaddq_n_f16(float16x8_t a, float16_t b)
 {
@@ -130,14 +172,23 @@ float16x8_t test_vaddq_n_f16(float16x8_t a, float16_t b)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_n_s8(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <16 x i8> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_n_s8(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <16 x i8> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_n_s8(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> [[A:%.*]], <16 x i8> [[DOTSPLAT]], <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <16 x i8> [[TMP2]]
 //
 int8x16_t test_vaddq_m_n_s8(int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
@@ -148,14 +199,23 @@ int8x16_t test_vaddq_m_n_s8(int8x16_t inactive, int8x16_t a, int8_t b, mve_pred1
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_m_n_f32(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
-// CHECK-NEXT:    ret <4 x float> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_m_n_f32(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]])
+// CHECK-NONSTRICT-NEXT:    ret <4 x float> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_m_n_f32(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> [[A:%.*]], <4 x float> [[DOTSPLAT]], <4 x i1> [[TMP1]], <4 x float> [[INACTIVE:%.*]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <4 x float> [[TMP2]]
 //
 float32x4_t test_vaddq_m_n_f32(float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
@@ -166,14 +226,23 @@ float32x4_t test_vaddq_m_n_f32(float32x4_t inactive, float32x4_t a, float32_t b,
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_n_u16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef)
-// CHECK-NEXT:    ret <8 x i16> [[TMP2]]
+// CHECK-NONSTRICT-LABEL: @test_vaddq_x_n_u16(
+// CHECK-NONSTRICT-NEXT:  entry:
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
+// CHECK-NONSTRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
+// CHECK-NONSTRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-NONSTRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
+// CHECK-NONSTRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef)
+// CHECK-NONSTRICT-NEXT:    ret <8 x i16> [[TMP2]]
+//
+// CHECK-STRICT-LABEL: @test_vaddq_x_n_u16(
+// CHECK-STRICT-NEXT:  entry:
+// CHECK-STRICT-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> poison, i16 [[B:%.*]], i64 0
+// CHECK-STRICT-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> [[A:%.*]], <8 x i16> [[DOTSPLAT]], <8 x i1> [[TMP1]], <8 x i16> undef) #[[ATTR2]]
+// CHECK-STRICT-NEXT:    ret <8 x i16> [[TMP2]]
 //
 uint16x8_t test_vaddq_x_n_u16(uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
@@ -184,14 +253,23 @@ uint16x8_t test_vaddq_x_n_u16(uint16x8_t a, uint16_t b, mve_pred16_t p)
 #endif /* POLYMORPHIC */
 }
 
-// CHECK-LABEL: @test_vaddq_x_n_f16(
-// CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> poison, half [[B:%.*]], i64 0
-// CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> poison, <8 x i32> zeroinitializer
-// CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
-// CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
-// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> [[A:%.*]], <8 x half> [[DOTSPLAT]], <...
[truncated]

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

✅ With the latest revision this PR passed the undef deprecator.

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@davemgreen davemgreen force-pushed the gh-mve-strictaddsubmul branch from 4d8d1a2 to 6846381 Compare November 22, 2025 10:09
As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not honor
flush-to-zero (for f32 they always flush, for f16 they follows the fpsrc flags)
and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32 intrinsics to
llvm.constrained.fadd/fsub/fmul, and the fadd/fsub/fmul without changing the
expected behaviour under strict-fp. This patch introduces a set in intrinsics
that we can use instead, going from vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

If this is acceptable, the other intrinsics I can add will be:
  fma
  fptoi, itofp
  trunc/round/ceil/etc
  fcmp
  minnum/maxnum
The current implementations assumes that the standard variant of a strictfp
alternative will be a IRBuilder, this can be changed to take a IRBuilder or
IRInt. There are also Neon intrinsics that AFAIU behave the same way.
@davemgreen davemgreen merged commit 1d64fd5 into llvm:main Nov 25, 2025
10 checks passed
@davemgreen davemgreen deleted the gh-mve-strictaddsubmul branch November 25, 2025 07:29
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 25, 2025

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/25348

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/X86/sse2-intrinsics-fast-isel.ll' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/buildbot/worker/arc-folder/build/bin/llc < /buildbot/worker/arc-folder/llvm-project/llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll -show-mc-encoding -fast-isel -mtriple=i386-unknown-unknown -mattr=+sse2 | /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll --check-prefixes=CHECK,X86,SSE,X86-SSE
# executed command: /buildbot/worker/arc-folder/build/bin/llc -show-mc-encoding -fast-isel -mtriple=i386-unknown-unknown -mattr=+sse2
# .---command stderr------------
# | LLVM ERROR: Cannot select: intrinsic %llvm.x86.sse2.clflush
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /buildbot/worker/arc-folder/build/bin/llc -show-mc-encoding -fast-isel -mtriple=i386-unknown-unknown -mattr=+sse2
# | 1.	Running pass 'Function Pass Manager' on module '<stdin>'.
# | 2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@test_mm_clflush'
# |  #0 0x00000000023971d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/buildbot/worker/arc-folder/build/bin/llc+0x23971d8)
# |  #1 0x00000000023940e5 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
# |  #2 0x00007ff48b702630 __restore_rt sigaction.c:0:0
# |  #3 0x00007ff48a4523d7 raise (/usr/lib64/libc.so.6+0x363d7)
# |  #4 0x00007ff48a453ac8 abort (/usr/lib64/libc.so.6+0x37ac8)
# |  #5 0x000000000072bfd9 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) JSON.cpp:0:0
# |  #6 0x00000000021166b9 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/buildbot/worker/arc-folder/build/bin/llc+0x21166b9)
# |  #7 0x000000000211b289 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/buildbot/worker/arc-folder/build/bin/llc+0x211b289)
# |  #8 0x000000000096e5e7 (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*) X86ISelDAGToDAG.cpp:0:0
# |  #9 0x0000000002111e9f llvm::SelectionDAGISel::DoInstructionSelection() (/buildbot/worker/arc-folder/build/bin/llc+0x2111e9f)
# | #10 0x00000000021225d8 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/buildbot/worker/arc-folder/build/bin/llc+0x21225d8)
# | #11 0x00000000021268c3 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/buildbot/worker/arc-folder/build/bin/llc+0x21268c3)
# | #12 0x00000000021273bc llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/buildbot/worker/arc-folder/build/bin/llc+0x21273bc)
# | #13 0x00000000021116af llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/buildbot/worker/arc-folder/build/bin/llc+0x21116af)
# | #14 0x00000000012249b7 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
# | #15 0x000000000189ce8b llvm::FPPassManager::runOnFunction(llvm::Function&) (/buildbot/worker/arc-folder/build/bin/llc+0x189ce8b)
# | #16 0x000000000189d231 llvm::FPPassManager::runOnModule(llvm::Module&) (/buildbot/worker/arc-folder/build/bin/llc+0x189d231)
# | #17 0x000000000189de45 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/buildbot/worker/arc-folder/build/bin/llc+0x189de45)
# | #18 0x0000000000810440 compileModule(char**, llvm::LLVMContext&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&) llc.cpp:0:0
# | #19 0x00000000007344e9 main (/buildbot/worker/arc-folder/build/bin/llc+0x7344e9)
# | #20 0x00007ff48a43e555 __libc_start_main (/usr/lib64/libc.so.6+0x22555)
# | #21 0x0000000000805516 _start (/buildbot/worker/arc-folder/build/bin/llc+0x805516)
# `-----------------------------
# error: command failed with exit status: -6
# executed command: /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll --check-prefixes=CHECK,X86,SSE,X86-SSE
# .---command stderr------------
# | /buildbot/worker/arc-folder/llvm-project/llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll:399:14: error: SSE-LABEL: expected string not found in input
# | ; SSE-LABEL: test_mm_bsrli_si128:
# |              ^
# | <stdin>:170:21: note: scanning from here
# | test_mm_bslli_si128: # @test_mm_bslli_si128
# |                     ^
# | <stdin>:178:9: note: possible intended match here
# |  .globl test_mm_bsrli_si128 # 
# |         ^
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 25, 2025

LLVM Buildbot has detected a new failure on builder clang-with-thin-lto-ubuntu running on as-worker-92 while building clang,llvm at step 7 "test-stage1-compiler".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/127/builds/5510

Here is the relevant piece of the build log for the reference
Step 7 (test-stage1-compiler) failure: build (failure)
...
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/ld64.lld
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/wasm-ld
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/utils/lit/tests/lit.cfg:111: warning: Setting a timeout per test not supported. Requires the Python psutil module but it could not be found. Try installing it via pip or via your operating system's package manager.
 Some tests will be skipped and the --timeout command line argument will not work.
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld.lld: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/ld.lld
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using lld-link: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/lld-link
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/ld64.lld
llvm-lit: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/wasm-ld
-- Testing: 89145 tests, 72 workers --
Testing: 
FAIL: LLVM :: CodeGen/RISCV/rvv/expandload.ll (1 of 89145)
******************** TEST 'LLVM :: CodeGen/RISCV/rvv/expandload.ll' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 2
/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc -verify-machineinstrs -mtriple=riscv32 -mattr=+v,+d,+m,+zbb /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/test/CodeGen/RISCV/rvv/expandload.ll -o -    | /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/FileCheck /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/test/CodeGen/RISCV/rvv/expandload.ll --check-prefixes=CHECK,CHECK-RV32
# executed command: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc -verify-machineinstrs -mtriple=riscv32 -mattr=+v,+d,+m,+zbb /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/test/CodeGen/RISCV/rvv/expandload.ll -o -
# .---command stderr------------
# | LLVM ERROR: Cannot select: intrinsic %llvm.riscv.viota
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc -verify-machineinstrs -mtriple=riscv32 -mattr=+v,+d,+m,+zbb /home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/test/CodeGen/RISCV/rvv/expandload.ll -o -
# | 1.	Running pass 'Function Pass Manager' on module '/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/llvm-project/llvm/test/CodeGen/RISCV/rvv/expandload.ll'.
# | 2.	Running pass 'RISC-V DAG->DAG Pattern Instruction Selection' on function '@test_expandload_v1i8'
# |  #0 0x000055a9fcbffe20 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x3acae20)
# |  #1 0x000055a9fcbfd01d SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
# |  #2 0x00007fb989272520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
# |  #3 0x00007fb9892c69fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
# |  #4 0x00007fb989272476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
# |  #5 0x00007fb9892587f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
# |  #6 0x000055a9f97a0fe4 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) JSON.cpp:0:0
# |  #7 0x000055a9fc991075 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x385c075)
# |  #8 0x000055a9fc99524d llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x386024d)
# |  #9 0x000055a9fabd0bcf llvm::RISCVDAGToDAGISel::Select(llvm::SDNode*) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x1a9bbcf)
# | #10 0x000055a9fc98dc92 llvm::SelectionDAGISel::DoInstructionSelection() (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x3858c92)
# | #11 0x000055a9fc99a676 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x3865676)
# | #12 0x000055a9fc99d0a2 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x38680a2)
# | #13 0x000055a9fc99ed5c llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x3869d5c)
# | #14 0x000055a9fc98d81d llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x385881d)
# | #15 0x000055a9fbc2a40d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x2af540d)
# | #16 0x000055a9fc1b2ec2 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x307dec2)
# | #17 0x000055a9fc1b3084 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x307e084)
# | #18 0x000055a9fc1b3ab2 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x307eab2)
# | #19 0x000055a9f98ee4be compileModule(char**, llvm::LLVMContext&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&) llc.cpp:0:0
# | #20 0x000055a9f97b3a1a main (/home/buildbot/as-worker-92/clang-with-thin-lto-ubuntu/build/stage1/bin/llc+0x67ea1a)
# | #21 0x00007fb989259d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
# | #22 0x00007fb989259e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)

aadeshps-mcw pushed a commit to aadeshps-mcw/llvm-project that referenced this pull request Nov 26, 2025
…#169156)

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not
honor flush-to-zero (for f32 they always flush, for f16 they follows the
fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32
intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul
without changing the expected behaviour under strict-fp. This patch
introduces a set in intrinsics that we can use instead, going from
vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

The current implementations assumes that the standard variant of a
strictfp alternative will be a IRBuilder, this can be changed to take a
IRBuilder or IRInt.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 27, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and IRInt.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 27, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and IRInt.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 27, 2025
Similar to llvm#169156 again, this is mostly for denormal handling as there is no
rounding step in a minnum/maxnum.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 27, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp vrnd nodes to make
sure they end up as the original instruction.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 27, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp compare nodes to
make sure they end up as the original instruction.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Nov 30, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and
IRInt.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 30, 2025
Similar to llvm#169156 again, this is mostly for denormal handling as there is no
rounding step in a minnum/maxnum.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 30, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp vrnd nodes to make
sure they end up as the original instruction.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 30, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp compare nodes to
make sure they end up as the original instruction.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 30, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp vrnd nodes to make
sure they end up as the original instruction.
aahrun pushed a commit to aahrun/llvm-project that referenced this pull request Dec 1, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and
IRInt.
davemgreen added a commit that referenced this pull request Dec 2, 2025
)

Similar to #169156 again, this is mostly for denormal handling as there
is no rounding step in a minnum/maxnum.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Dec 2, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp vrnd nodes to make
sure they end up as the original instruction.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Dec 2, 2025
Similar to llvm#169156 again, this adds intrinsics for strict-fp compare nodes to
make sure they end up as the original instruction.
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Dec 3, 2025
…#169156)

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not
honor flush-to-zero (for f32 they always flush, for f16 they follows the
fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32
intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul
without changing the expected behaviour under strict-fp. This patch
introduces a set in intrinsics that we can use instead, going from
vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

The current implementations assumes that the standard variant of a
strictfp alternative will be a IRBuilder, this can be changed to take a
IRBuilder or IRInt.
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Dec 3, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and
IRInt.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…#169156)

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not
honor flush-to-zero (for f32 they always flush, for f16 they follows the
fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32
intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul
without changing the expected behaviour under strict-fp. This patch
introduces a set in intrinsics that we can use instead, going from
vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

The current implementations assumes that the standard variant of a
strictfp alternative will be a IRBuilder, this can be changed to take a
IRBuilder or IRInt.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and
IRInt.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…#169795)

Similar to llvm#169156 again, this is mostly for denormal handling as there
is no rounding step in a minnum/maxnum.
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
Similar to llvm#169156, this adds an @arm.mve.fma intrinsic for strict-fp. A
Builder class is added to act as the common subclass of IRBuilder and
IRInt.
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
…#169795)

Similar to llvm#169156 again, this is mostly for denormal handling as there
is no rounding step in a minnum/maxnum.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:ir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants