Skip to content

Conversation

@kinoshita-fj
Copy link
Contributor

… LDR/STR

PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.

@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Kinoshita Kotaro (kinoshita-fj)

Changes

… LDR/STR

PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.


Full diff: https://github.com/llvm/llvm-project/pull/170256.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+2-1)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+1-1)
  • (added) llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll (+29)
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 066724bea92c9..f1baaf82195f9 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -915,6 +915,10 @@ def FeatureUseWzrToVecMove : SubtargetFeature<"use-wzr-to-vec-move",
                                               "UseWzrToVecMove", "true",
                                               "Move from WZR to insert 0 into vector registers">;
 
+def FeatureDisableUnpredicatedLdStLower : SubtargetFeature<
+    "disable-unpredicated-ld-st-lower", "DisableUnpredicatedLdStLower",
+    "true", "Disable lowering unpredicated loads/stores as LDR/STR">;
+
 //===----------------------------------------------------------------------===//
 // Architectures.
 //
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index da93a2b13fc11..5490ee7201f3b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -443,6 +443,8 @@ def AllowMisalignedMemAccesses
 
 def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;
 
+def AllowUnpredicatedLdStLower
+                    : Predicate<"!Subtarget->disableUnpredicatedLdStLower()">;
 
 //===----------------------------------------------------------------------===//
 // AArch64-specific DAG Nodes.
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 120415f91c9ae..72882ac078c55 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -306,7 +306,8 @@ def TuneA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX",
                                  FeatureAggressiveFMA,
                                  FeatureArithmeticBccFusion,
                                  FeatureStorePairSuppress,
-                                 FeaturePredictableSelectIsExpensive]>;
+                                 FeaturePredictableSelectIsExpensive,
+                                 FeatureDisableUnpredicatedLdStLower]>;
 
 def TuneMONAKA : SubtargetFeature<"fujitsu-monaka", "ARMProcFamily", "MONAKA",
                                  "Fujitsu FUJITSU-MONAKA processors", [
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index e99b3f8ff07e0..4d549c6c55d17 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3164,7 +3164,7 @@ let Predicates = [HasSVE_or_SME] in {
   }
 
   // Allow using LDR/STR to avoid the predicate dependence.
-  let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses] in
+  let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses, AllowUnpredicatedLdStLower] in
     foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, nxv2f64, nxv8bf16 ] in {
       let AddedComplexity = 2 in {
         def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))),
diff --git a/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
new file mode 100644
index 0000000000000..dd654df2c2a5d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s
+; RUN: llc -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1]
+; CHECK-NEXT:    ret
+;
+; CHECK-DEFAULT-LABEL: nxv2i64:
+; CHECK-DEFAULT:       // %bb.0:
+; CHECK-DEFAULT-NEXT:    ldr z0, [x0]
+; CHECK-DEFAULT-NEXT:    str z0, [x1]
+; CHECK-DEFAULT-NEXT:    ret
+;
+; CHECK-A64FX-LABEL: nxv2i64:
+; CHECK-A64FX:       // %bb.0:
+; CHECK-A64FX-NEXT:    ptrue p0.d
+; CHECK-A64FX-NEXT:    ld1d { z0.d }, p0/z, [x0]
+; CHECK-A64FX-NEXT:    st1d { z0.d }, p0, [x1]
+; CHECK-A64FX-NEXT:    ret
+  %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8
+  store <vscale x 2 x i64> %l3, ptr %stptr, align 8
+  ret void
+}

@kinoshita-fj kinoshita-fj force-pushed the feature/disable-lowering-unpredicated-load-store-as-ldr-str branch from 3ef0399 to 6d234a0 Compare December 2, 2025 08:34
@kinoshita-fj kinoshita-fj changed the title Add SubtargetFeature to disable lowering unpredicated loads/stores as… [AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

🐧 Linux x64 Test Results

  • 186825 tests passed
  • 4910 tests skipped

✅ The build succeeded and all tests passed.

…loads/stores as LDR/STR

PR llvm#127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1.
However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR.
As a result, the lowering introduced in llvm#127837 can cause a performance regression on these targets.
This patch adds a SubtargetFeature to disable this lowering and prevent the regression.
@kinoshita-fj kinoshita-fj force-pushed the feature/disable-lowering-unpredicated-load-store-as-ldr-str branch from 6d234a0 to 1550102 Compare December 2, 2025 11:00
Comment on lines +1 to +5
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s
; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s

Copy link
Collaborator

@paulwalker-arm paulwalker-arm Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To increase the test coverage, rather than adding a new file, please can you add the extra RUN lines to llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll and llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll instead?


def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;

def AllowUnpredicatedLdStLower
Copy link
Collaborator

@paulwalker-arm paulwalker-arm Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you but perhaps "AggressiveUseOfSVEFillSpillInstructions" and "DisableAggressiveUseOfSVEFillSpillInstructions"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants