[AArch64][CodeGen] Add patterns for +lsfe atomic stores #131174

kmclaughlin-arm · 2025-03-13T17:39:49Z

When FEAT_LSFE is enabled, the ST{B}FADD, ST{B}FMAX and ST{B}FMIN
atomic instructions are available. This patch adds patterns to match an
atomicrmw fadd, fmin or fmax to these instructions when the result is unused.

result is unused.

When FEAT_LSFE is enabled, the ST{B}FADD, ST{B}FMAX and ST{B}FMIN atomic instructions are available. This adds patterns to match an atomicrmw fadd, fmin or fmax to these instructions when the result is unused.

llvmbot · 2025-03-13T17:40:12Z

@llvm/pr-subscribers-backend-aarch64

Author: Kerry McLaughlin (kmclaughlin-arm)

Changes

When FEAT_LSFE is enabled, the ST{B}FADD, ST{B}FMAX and ST{B}FMIN
atomic instructions are available. This patch adds patterns to match an
atomicrmw fadd, fmin or fmax to these instructions when the result is unused.

Patch is 34.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131174.diff

5 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrAtomics.td (+6-6)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+26-6)
(modified) llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll (+168)
(modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomicrmw-lsfe.ll (+168)
(modified) llvm/test/CodeGen/AArch64/Atomics/generate-tests.py (+23-3)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index 2d7a9d6f00bd0..aaf8e1bfc8f0f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -548,13 +548,13 @@ defm atomic_load_fmin  : binary_atomic_op_fp<atomic_load_fmin>;
 defm atomic_load_fmax  : binary_atomic_op_fp<atomic_load_fmax>;
 
 let Predicates = [HasLSFE] in {
-  defm : LDFPOPregister_patterns<"LDFADD",   "atomic_load_fadd">;
-  defm : LDFPOPregister_patterns<"LDFMAXNM", "atomic_load_fmax">;
-  defm : LDFPOPregister_patterns<"LDFMINNM", "atomic_load_fmin">;
+  defm : LDFPOPregister_patterns<"FADD",   "atomic_load_fadd">;
+  defm : LDFPOPregister_patterns<"FMAXNM", "atomic_load_fmax">;
+  defm : LDFPOPregister_patterns<"FMINNM", "atomic_load_fmin">;
 
-  defm : LDBFPOPregister_patterns<"LDBFADD",   "atomic_load_fadd">;
-  defm : LDBFPOPregister_patterns<"LDBFMAXNM", "atomic_load_fmax">;
-  defm : LDBFPOPregister_patterns<"LDBFMINNM", "atomic_load_fmin">;
+  defm : LDBFPOPregister_patterns<"BFADD",   "atomic_load_fadd">;
+  defm : LDBFPOPregister_patterns<"BFMAXNM", "atomic_load_fmax">;
+  defm : LDBFPOPregister_patterns<"BFMINNM", "atomic_load_fmin">;
 }
 
 // v8.9a/v9.4a FEAT_LRCPC patterns
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 255cd0ec5840c..99bd85a448d83 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -12493,19 +12493,39 @@ multiclass LDOPregister_patterns_mod<string inst, string op, string mod> {
                         (i32 (!cast<Instruction>(mod#Wrr) WZR, GPR32:$Rm))>;
 }
 
+class AtomicStoreMonotonicFrag<SDNode atomic_op> : PatFrag<
+  (ops node:$ptr, node:$value),
+  (atomic_op node:$ptr, node:$value),
+  [{ return SDValue(N, 0).use_empty() &&
+     cast<AtomicSDNode>(N)->getMergedOrdering() == AtomicOrdering::Monotonic; }]>;
+
+class AtomicStoreReleaseFrag<SDNode atomic_op> : PatFrag<
+  (ops node:$ptr, node:$value),
+  (atomic_op node:$ptr, node:$value),
+  [{ return SDValue(N, 0).use_empty() &&
+     cast<AtomicSDNode>(N)->getMergedOrdering() == AtomicOrdering::Release; }]>;
+
 let Predicates = [HasLSFE] in
 multiclass LDFPOPregister_patterns_ord_dag<string inst, string suffix, string op,
-                                         ValueType vt, dag data> {
+                                           ValueType vt, dag data> {
   def : Pat<(!cast<PatFrag>(op#"_"#vt#"_monotonic") FPR64:$Rn, data),
-            (!cast<Instruction>(inst # suffix) data, FPR64:$Rn)>;
+            (!cast<Instruction>("LD" # inst # suffix) data, FPR64:$Rn)>;
   def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acquire") FPR64:$Rn, data),
-            (!cast<Instruction>(inst # "A" # suffix) data, FPR64:$Rn)>;
+            (!cast<Instruction>("LD" # inst # "A" # suffix) data, FPR64:$Rn)>;
   def : Pat<(!cast<PatFrag>(op#"_"#vt#"_release") FPR64:$Rn, data),
-            (!cast<Instruction>(inst # "L" # suffix) data, FPR64:$Rn)>;
+            (!cast<Instruction>("LD" # inst # "L" # suffix) data, FPR64:$Rn)>;
   def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acq_rel") FPR64:$Rn, data),
-            (!cast<Instruction>(inst # "AL" # suffix) data, FPR64:$Rn)>;
+            (!cast<Instruction>("LD" # inst # "AL" # suffix) data, FPR64:$Rn)>;
   def : Pat<(!cast<PatFrag>(op#"_"#vt#"_seq_cst") FPR64:$Rn, data),
-            (!cast<Instruction>(inst # "AL" # suffix) data, FPR64:$Rn)>;
+            (!cast<Instruction>("LD" # inst # "AL" # suffix) data, FPR64:$Rn)>;
+
+  let AddedComplexity = 5 in {
+    def : Pat<(AtomicStoreMonotonicFrag<!cast<SDNode>(op)> FPR64:$Rn, data),
+              (!cast<Instruction>("ST" # inst # suffix) data, FPR64:$Rn)>;
+
+    def : Pat<(AtomicStoreReleaseFrag<!cast<SDNode>(op)> FPR64:$Rn, data),
+              (!cast<Instruction>("ST" # inst # "L" # suffix) data, FPR64:$Rn)>;
+  }
 }
 
 multiclass LDFPOPregister_patterns_ord<string inst, string suffix, string op,
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
index fc9a126f79a83..bd9b814ac533e 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
@@ -10,6 +10,13 @@ define dso_local half @atomicrmw_fadd_half_aligned_monotonic(ptr %ptr, half %val
     ret half %r
 }
 
+define dso_local void @atomicrmw_fadd_half_aligned_monotonic_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fadd_half_aligned_monotonic_to_store:
+; CHECK:    stfadd h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, half %value monotonic, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fadd_half_aligned_acquire(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fadd_half_aligned_acquire:
 ; CHECK:    ldfadda h0, h0, [x0]
@@ -24,6 +31,13 @@ define dso_local half @atomicrmw_fadd_half_aligned_release(ptr %ptr, half %value
     ret half %r
 }
 
+define dso_local void @atomicrmw_fadd_half_aligned_release_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fadd_half_aligned_release_to_store:
+; CHECK:    stfaddl h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, half %value release, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fadd_half_aligned_acq_rel(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fadd_half_aligned_acq_rel:
 ; CHECK:    ldfaddal h0, h0, [x0]
@@ -45,6 +59,13 @@ define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_monotonic(ptr %ptr, bfloa
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fadd_bfloat_aligned_monotonic_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_monotonic_to_store:
+; CHECK:    stbfadd h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, bfloat %value monotonic, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_acquire(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_acquire:
 ; CHECK:    ldbfadda h0, h0, [x0]
@@ -59,6 +80,13 @@ define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_release(ptr %ptr, bfloat
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fadd_bfloat_aligned_release_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_release_to_store:
+; CHECK:    stbfaddl h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, bfloat %value release, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_acq_rel(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_acq_rel:
 ; CHECK:    ldbfaddal h0, h0, [x0]
@@ -80,6 +108,13 @@ define dso_local float @atomicrmw_fadd_float_aligned_monotonic(ptr %ptr, float %
     ret float %r
 }
 
+define dso_local void @atomicrmw_fadd_float_aligned_monotonic_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fadd_float_aligned_monotonic_to_store:
+; CHECK:    stfadd s0, [x0]
+    %r = atomicrmw fadd ptr %ptr, float %value monotonic, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fadd_float_aligned_acquire(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fadd_float_aligned_acquire:
 ; CHECK:    ldfadda s0, s0, [x0]
@@ -94,6 +129,13 @@ define dso_local float @atomicrmw_fadd_float_aligned_release(ptr %ptr, float %va
     ret float %r
 }
 
+define dso_local void @atomicrmw_fadd_float_aligned_release_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fadd_float_aligned_release_to_store:
+; CHECK:    stfaddl s0, [x0]
+    %r = atomicrmw fadd ptr %ptr, float %value release, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fadd_float_aligned_acq_rel(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fadd_float_aligned_acq_rel:
 ; CHECK:    ldfaddal s0, s0, [x0]
@@ -115,6 +157,13 @@ define dso_local double @atomicrmw_fadd_double_aligned_monotonic(ptr %ptr, doubl
     ret double %r
 }
 
+define dso_local void @atomicrmw_fadd_double_aligned_monotonic_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fadd_double_aligned_monotonic_to_store:
+; CHECK:    stfadd d0, [x0]
+    %r = atomicrmw fadd ptr %ptr, double %value monotonic, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fadd_double_aligned_acquire(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fadd_double_aligned_acquire:
 ; CHECK:    ldfadda d0, d0, [x0]
@@ -129,6 +178,13 @@ define dso_local double @atomicrmw_fadd_double_aligned_release(ptr %ptr, double
     ret double %r
 }
 
+define dso_local void @atomicrmw_fadd_double_aligned_release_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fadd_double_aligned_release_to_store:
+; CHECK:    stfaddl d0, [x0]
+    %r = atomicrmw fadd ptr %ptr, double %value release, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fadd_double_aligned_acq_rel(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fadd_double_aligned_acq_rel:
 ; CHECK:    ldfaddal d0, d0, [x0]
@@ -805,6 +861,13 @@ define dso_local half @atomicrmw_fmax_half_aligned_monotonic(ptr %ptr, half %val
     ret half %r
 }
 
+define dso_local void @atomicrmw_fmax_half_aligned_monotonic_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fmax_half_aligned_monotonic_to_store:
+; CHECK:    stfmaxnm h0, [x0]
+    %r = atomicrmw fmax ptr %ptr, half %value monotonic, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fmax_half_aligned_acquire(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fmax_half_aligned_acquire:
 ; CHECK:    ldfmaxnma h0, h0, [x0]
@@ -819,6 +882,13 @@ define dso_local half @atomicrmw_fmax_half_aligned_release(ptr %ptr, half %value
     ret half %r
 }
 
+define dso_local void @atomicrmw_fmax_half_aligned_release_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fmax_half_aligned_release_to_store:
+; CHECK:    stfmaxnml h0, [x0]
+    %r = atomicrmw fmax ptr %ptr, half %value release, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fmax_half_aligned_acq_rel(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fmax_half_aligned_acq_rel:
 ; CHECK:    ldfmaxnmal h0, h0, [x0]
@@ -840,6 +910,13 @@ define dso_local bfloat @atomicrmw_fmax_bfloat_aligned_monotonic(ptr %ptr, bfloa
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fmax_bfloat_aligned_monotonic_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fmax_bfloat_aligned_monotonic_to_store:
+; CHECK:    stbfmaxnm h0, [x0]
+    %r = atomicrmw fmax ptr %ptr, bfloat %value monotonic, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fmax_bfloat_aligned_acquire(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fmax_bfloat_aligned_acquire:
 ; CHECK:    ldbfmaxnma h0, h0, [x0]
@@ -854,6 +931,13 @@ define dso_local bfloat @atomicrmw_fmax_bfloat_aligned_release(ptr %ptr, bfloat
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fmax_bfloat_aligned_release_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fmax_bfloat_aligned_release_to_store:
+; CHECK:    stbfmaxnml h0, [x0]
+    %r = atomicrmw fmax ptr %ptr, bfloat %value release, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fmax_bfloat_aligned_acq_rel(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fmax_bfloat_aligned_acq_rel:
 ; CHECK:    ldbfmaxnmal h0, h0, [x0]
@@ -875,6 +959,13 @@ define dso_local float @atomicrmw_fmax_float_aligned_monotonic(ptr %ptr, float %
     ret float %r
 }
 
+define dso_local void @atomicrmw_fmax_float_aligned_monotonic_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fmax_float_aligned_monotonic_to_store:
+; CHECK:    stfmaxnm s0, [x0]
+    %r = atomicrmw fmax ptr %ptr, float %value monotonic, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fmax_float_aligned_acquire(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fmax_float_aligned_acquire:
 ; CHECK:    ldfmaxnma s0, s0, [x0]
@@ -889,6 +980,13 @@ define dso_local float @atomicrmw_fmax_float_aligned_release(ptr %ptr, float %va
     ret float %r
 }
 
+define dso_local void @atomicrmw_fmax_float_aligned_release_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fmax_float_aligned_release_to_store:
+; CHECK:    stfmaxnml s0, [x0]
+    %r = atomicrmw fmax ptr %ptr, float %value release, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fmax_float_aligned_acq_rel(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fmax_float_aligned_acq_rel:
 ; CHECK:    ldfmaxnmal s0, s0, [x0]
@@ -910,6 +1008,13 @@ define dso_local double @atomicrmw_fmax_double_aligned_monotonic(ptr %ptr, doubl
     ret double %r
 }
 
+define dso_local void @atomicrmw_fmax_double_aligned_monotonic_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fmax_double_aligned_monotonic_to_store:
+; CHECK:    stfmaxnm d0, [x0]
+    %r = atomicrmw fmax ptr %ptr, double %value monotonic, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fmax_double_aligned_acquire(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fmax_double_aligned_acquire:
 ; CHECK:    ldfmaxnma d0, d0, [x0]
@@ -924,6 +1029,13 @@ define dso_local double @atomicrmw_fmax_double_aligned_release(ptr %ptr, double
     ret double %r
 }
 
+define dso_local void @atomicrmw_fmax_double_aligned_release_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fmax_double_aligned_release_to_store:
+; CHECK:    stfmaxnml d0, [x0]
+    %r = atomicrmw fmax ptr %ptr, double %value release, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fmax_double_aligned_acq_rel(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fmax_double_aligned_acq_rel:
 ; CHECK:    ldfmaxnmal d0, d0, [x0]
@@ -1120,6 +1232,13 @@ define dso_local half @atomicrmw_fmin_half_aligned_monotonic(ptr %ptr, half %val
     ret half %r
 }
 
+define dso_local void @atomicrmw_fmin_half_aligned_monotonic_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fmin_half_aligned_monotonic_to_store:
+; CHECK:    stfminnm h0, [x0]
+    %r = atomicrmw fmin ptr %ptr, half %value monotonic, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fmin_half_aligned_acquire(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fmin_half_aligned_acquire:
 ; CHECK:    ldfminnma h0, h0, [x0]
@@ -1134,6 +1253,13 @@ define dso_local half @atomicrmw_fmin_half_aligned_release(ptr %ptr, half %value
     ret half %r
 }
 
+define dso_local void @atomicrmw_fmin_half_aligned_release_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fmin_half_aligned_release_to_store:
+; CHECK:    stfminnml h0, [x0]
+    %r = atomicrmw fmin ptr %ptr, half %value release, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fmin_half_aligned_acq_rel(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fmin_half_aligned_acq_rel:
 ; CHECK:    ldfminnmal h0, h0, [x0]
@@ -1155,6 +1281,13 @@ define dso_local bfloat @atomicrmw_fmin_bfloat_aligned_monotonic(ptr %ptr, bfloa
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fmin_bfloat_aligned_monotonic_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fmin_bfloat_aligned_monotonic_to_store:
+; CHECK:    stbfminnm h0, [x0]
+    %r = atomicrmw fmin ptr %ptr, bfloat %value monotonic, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fmin_bfloat_aligned_acquire(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fmin_bfloat_aligned_acquire:
 ; CHECK:    ldbfminnma h0, h0, [x0]
@@ -1169,6 +1302,13 @@ define dso_local bfloat @atomicrmw_fmin_bfloat_aligned_release(ptr %ptr, bfloat
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fmin_bfloat_aligned_release_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fmin_bfloat_aligned_release_to_store:
+; CHECK:    stbfminnml h0, [x0]
+    %r = atomicrmw fmin ptr %ptr, bfloat %value release, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fmin_bfloat_aligned_acq_rel(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fmin_bfloat_aligned_acq_rel:
 ; CHECK:    ldbfminnmal h0, h0, [x0]
@@ -1190,6 +1330,13 @@ define dso_local float @atomicrmw_fmin_float_aligned_monotonic(ptr %ptr, float %
     ret float %r
 }
 
+define dso_local void @atomicrmw_fmin_float_aligned_monotonic_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fmin_float_aligned_monotonic_to_store:
+; CHECK:    stfminnm s0, [x0]
+    %r = atomicrmw fmin ptr %ptr, float %value monotonic, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fmin_float_aligned_acquire(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fmin_float_aligned_acquire:
 ; CHECK:    ldfminnma s0, s0, [x0]
@@ -1204,6 +1351,13 @@ define dso_local float @atomicrmw_fmin_float_aligned_release(ptr %ptr, float %va
     ret float %r
 }
 
+define dso_local void @atomicrmw_fmin_float_aligned_release_to_store(ptr %ptr, float %value) {
+; CHECK-LABEL: atomicrmw_fmin_float_aligned_release_to_store:
+; CHECK:    stfminnml s0, [x0]
+    %r = atomicrmw fmin ptr %ptr, float %value release, align 4
+    ret void
+}
+
 define dso_local float @atomicrmw_fmin_float_aligned_acq_rel(ptr %ptr, float %value) {
 ; CHECK-LABEL: atomicrmw_fmin_float_aligned_acq_rel:
 ; CHECK:    ldfminnmal s0, s0, [x0]
@@ -1225,6 +1379,13 @@ define dso_local double @atomicrmw_fmin_double_aligned_monotonic(ptr %ptr, doubl
     ret double %r
 }
 
+define dso_local void @atomicrmw_fmin_double_aligned_monotonic_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fmin_double_aligned_monotonic_to_store:
+; CHECK:    stfminnm d0, [x0]
+    %r = atomicrmw fmin ptr %ptr, double %value monotonic, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fmin_double_aligned_acquire(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fmin_double_aligned_acquire:
 ; CHECK:    ldfminnma d0, d0, [x0]
@@ -1239,6 +1400,13 @@ define dso_local double @atomicrmw_fmin_double_aligned_release(ptr %ptr, double
     ret double %r
 }
 
+define dso_local void @atomicrmw_fmin_double_aligned_release_to_store(ptr %ptr, double %value) {
+; CHECK-LABEL: atomicrmw_fmin_double_aligned_release_to_store:
+; CHECK:    stfminnml d0, [x0]
+    %r = atomicrmw fmin ptr %ptr, double %value release, align 8
+    ret void
+}
+
 define dso_local double @atomicrmw_fmin_double_aligned_acq_rel(ptr %ptr, double %value) {
 ; CHECK-LABEL: atomicrmw_fmin_double_aligned_acq_rel:
 ; CHECK:    ldfminnmal d0, d0, [x0]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomicrmw-lsfe.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomicrmw-lsfe.ll
index a22cc5806d86d..67a5565f31d94 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomicrmw-lsfe.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomicrmw-lsfe.ll
@@ -10,6 +10,13 @@ define dso_local half @atomicrmw_fadd_half_aligned_monotonic(ptr %ptr, half %val
     ret half %r
 }
 
+define dso_local void @atomicrmw_fadd_half_aligned_monotonic_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fadd_half_aligned_monotonic_to_store:
+; CHECK:    stfadd h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, half %value monotonic, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fadd_half_aligned_acquire(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fadd_half_aligned_acquire:
 ; CHECK:    ldfadda h0, h0, [x0]
@@ -24,6 +31,13 @@ define dso_local half @atomicrmw_fadd_half_aligned_release(ptr %ptr, half %value
     ret half %r
 }
 
+define dso_local void @atomicrmw_fadd_half_aligned_release_to_store(ptr %ptr, half %value) {
+; CHECK-LABEL: atomicrmw_fadd_half_aligned_release_to_store:
+; CHECK:    stfaddl h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, half %value release, align 2
+    ret void
+}
+
 define dso_local half @atomicrmw_fadd_half_aligned_acq_rel(ptr %ptr, half %value) {
 ; CHECK-LABEL: atomicrmw_fadd_half_aligned_acq_rel:
 ; CHECK:    ldfaddal h0, h0, [x0]
@@ -45,6 +59,13 @@ define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_monotonic(ptr %ptr, bfloa
     ret bfloat %r
 }
 
+define dso_local void @atomicrmw_fadd_bfloat_aligned_monotonic_to_store(ptr %ptr, bfloat %value) {
+; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_monotonic_to_store:
+; CHECK:    stbfadd h0, [x0]
+    %r = atomicrmw fadd ptr %ptr, bfloat %value monotonic, align 2
+    ret void
+}
+
 define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_acquire(ptr %ptr, bfloat %value) {
 ; CHECK-LABEL: atomicrmw_fadd_bfloat_aligned_acquire:
 ; CHECK:    ldbfadda h0, h0, [x0]
@@ -59,6 +80,13 @@ define dso_local bfloat @atomicrmw_fadd_bfloat_aligned_release(ptr %ptr, bfloat
     ret ...
[truncated]

github-actions · 2025-03-13T17:43:55Z

✅ With the latest revision this PR passed the Python code formatter.

jthackray

LGTM, nice and compact code.

davemgreen · 2025-03-14T11:36:58Z

Hi - I'm not an atomics expert by a long shot, but I don't believe we generate stadd (for example) from codegen and I would expect these new intrinsics to act the same.

There was a thread on this in #72887, which links to a couple of threads on the GCC mailing lists. It looks like this is implementing relaxed and release semantics, not acquire, which would be the last thread. The important part being

However, the architecture says:

  | The ST<OP> instructions, and LD<OP> instructions where the destination
  | register is WZR or XZR, are not regarded as doing a read for the purpose
  | of a DMB LD barrier.

Should be we treating these intrinsics the same as the existing ldadd etc, or do you know if they can be treated differently from a memory model perspective? Or can we do this for all intrinsics now? Thanks.

kmclaughlin-arm · 2025-03-18T16:48:47Z

Should be we treating these intrinsics the same as the existing ldadd etc, or do you know if they can be treated differently from a memory model perspective? Or can we do this for all intrinsics now? Thanks.

Hi @davemgreen, thank you for pointing out the other patch & mailing list threads. I wasn't aware this had already been discussed and after reading through these I'm going to close this PR as I don't believe these intrinsics can be treated differently to the existing ones.

tmatheson-arm · 2025-03-18T16:54:07Z

For future reference, here is a proposal to add "atomic reduction operations" that would be able to make use of STADD/STFADD etc: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3111r1.html

I wonder whether we should add the tests anyway, to check that they don't lower to ST[F]ADD?

davemgreen · 2025-03-18T18:23:11Z

I wonder whether we should add the tests anyway, to check that they don't lower to ST[F]ADD?

Yeah they might be useful to keep. The last few times this has come up there has been a suggestion to add a comment somewhere to explain why we don't generate them too.

Following the discussion on llvm#131174, update generate-tests.py script to emit atomicrmw tests where the result is unused and add a note to explain why these do use ST[F]ADD.

kmclaughlin-arm · 2025-03-19T13:33:10Z

@tmatheson-arm @davemgreen I've moved the tests & changes to the generate-tests script into #132022, adding a reference to this PR.

…ed. (#132022) Following the discussion on #131174, update generate-tests.py script to emit atomicrmw tests where the result is unused and add a note to explain why these do use ST[F]ADD.

kmclaughlin-arm added 2 commits March 13, 2025 16:44

Update generate-tests.py script to add +lsfe atomicrmw tests where the

ba0dc8f

result is unused.

[AArch64] Add patterns for +lsfe atomic stores

aaefe4b

When FEAT_LSFE is enabled, the ST{B}FADD, ST{B}FMAX and ST{B}FMIN atomic instructions are available. This adds patterns to match an atomicrmw fadd, fmin or fmax to these instructions when the result is unused.

kmclaughlin-arm added the backend:AArch64 label Mar 13, 2025

kmclaughlin-arm requested review from CarolineConcatto, Lukacma, jthackray and tmatheson-arm March 13, 2025 17:39

- Fix formatting issues in generate-tests.py

d92da0a

jthackray approved these changes Mar 14, 2025

View reviewed changes

kmclaughlin-arm closed this Mar 18, 2025

kmclaughlin-arm mentioned this pull request Mar 19, 2025

[AArch64][CodeGen] Add +lsfe atomicrmw tests where the result is unused. #132022

Merged

kmclaughlin-arm deleted the atomic-store-patterns branch May 27, 2025 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][CodeGen] Add patterns for +lsfe atomic stores #131174

[AArch64][CodeGen] Add patterns for +lsfe atomic stores #131174

Uh oh!

kmclaughlin-arm commented Mar 13, 2025

Uh oh!

llvmbot commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025 •

edited

Loading

Uh oh!

jthackray left a comment

Uh oh!

davemgreen commented Mar 14, 2025

Uh oh!

kmclaughlin-arm commented Mar 18, 2025

Uh oh!

tmatheson-arm commented Mar 18, 2025

Uh oh!

davemgreen commented Mar 18, 2025

Uh oh!

kmclaughlin-arm commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AArch64][CodeGen] Add patterns for +lsfe atomic stores #131174

[AArch64][CodeGen] Add patterns for +lsfe atomic stores #131174

Uh oh!

Conversation

kmclaughlin-arm commented Mar 13, 2025

Uh oh!

llvmbot commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jthackray left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Mar 14, 2025

Uh oh!

kmclaughlin-arm commented Mar 18, 2025

Uh oh!

tmatheson-arm commented Mar 18, 2025

Uh oh!

davemgreen commented Mar 18, 2025

Uh oh!

kmclaughlin-arm commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Mar 13, 2025 •

edited

Loading