[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256

kinoshita-fj · 2025-12-02T07:58:27Z

… LDR/STR

PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.

llvmbot · 2025-12-02T07:59:00Z

@llvm/pr-subscribers-backend-aarch64

Author: Kinoshita Kotaro (kinoshita-fj)

Changes

… LDR/STR

PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.

Full diff: https://github.com/llvm/llvm-project/pull/170256.diff

5 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64Features.td (+4)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2)
(modified) llvm/lib/Target/AArch64/AArch64Processors.td (+2-1)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+1-1)
(added) llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll (+29)

diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 066724bea92c9..f1baaf82195f9 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -915,6 +915,10 @@ def FeatureUseWzrToVecMove : SubtargetFeature<"use-wzr-to-vec-move",
                                               "UseWzrToVecMove", "true",
                                               "Move from WZR to insert 0 into vector registers">;
 
+def FeatureDisableUnpredicatedLdStLower : SubtargetFeature<
+    "disable-unpredicated-ld-st-lower", "DisableUnpredicatedLdStLower",
+    "true", "Disable lowering unpredicated loads/stores as LDR/STR">;
+
 //===----------------------------------------------------------------------===//
 // Architectures.
 //
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index da93a2b13fc11..5490ee7201f3b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -443,6 +443,8 @@ def AllowMisalignedMemAccesses
 
 def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;
 
+def AllowUnpredicatedLdStLower
+                    : Predicate<"!Subtarget->disableUnpredicatedLdStLower()">;
 
 //===----------------------------------------------------------------------===//
 // AArch64-specific DAG Nodes.
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 120415f91c9ae..72882ac078c55 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -306,7 +306,8 @@ def TuneA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX",
                                  FeatureAggressiveFMA,
                                  FeatureArithmeticBccFusion,
                                  FeatureStorePairSuppress,
-                                 FeaturePredictableSelectIsExpensive]>;
+                                 FeaturePredictableSelectIsExpensive,
+                                 FeatureDisableUnpredicatedLdStLower]>;
 
 def TuneMONAKA : SubtargetFeature<"fujitsu-monaka", "ARMProcFamily", "MONAKA",
                                  "Fujitsu FUJITSU-MONAKA processors", [
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index e99b3f8ff07e0..4d549c6c55d17 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3164,7 +3164,7 @@ let Predicates = [HasSVE_or_SME] in {
   }
 
   // Allow using LDR/STR to avoid the predicate dependence.
-  let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses] in
+  let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses, AllowUnpredicatedLdStLower] in
     foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, nxv2f64, nxv8bf16 ] in {
       let AddedComplexity = 2 in {
         def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))),
diff --git a/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
new file mode 100644
index 0000000000000..dd654df2c2a5d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s
+; RUN: llc -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1]
+; CHECK-NEXT:    ret
+;
+; CHECK-DEFAULT-LABEL: nxv2i64:
+; CHECK-DEFAULT:       // %bb.0:
+; CHECK-DEFAULT-NEXT:    ldr z0, [x0]
+; CHECK-DEFAULT-NEXT:    str z0, [x1]
+; CHECK-DEFAULT-NEXT:    ret
+;
+; CHECK-A64FX-LABEL: nxv2i64:
+; CHECK-A64FX:       // %bb.0:
+; CHECK-A64FX-NEXT:    ptrue p0.d
+; CHECK-A64FX-NEXT:    ld1d { z0.d }, p0/z, [x0]
+; CHECK-A64FX-NEXT:    st1d { z0.d }, p0, [x1]
+; CHECK-A64FX-NEXT:    ret
+  %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8
+  store <vscale x 2 x i64> %l3, ptr %stptr, align 8
+  ret void
+}

github-actions · 2025-12-02T09:21:55Z

🐧 Linux x64 Test Results

186825 tests passed
4910 tests skipped

✅ The build succeeded and all tests passed.

…loads/stores as LDR/STR PR llvm#127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in llvm#127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.

paulwalker-arm · 2025-12-02T11:15:02Z

llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll

+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s
+


To increase the test coverage, rather than adding a new file, please can you add the extra RUN lines to llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll and llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll instead?

paulwalker-arm · 2025-12-02T11:18:10Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td


 def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;

+def AllowUnpredicatedLdStLower


Up to you but perhaps "AggressiveUseOfSVEFillSpillInstructions" and "DisableAggressiveUseOfSVEFillSpillInstructions"?

llvmbot added the backend:AArch64 label Dec 2, 2025

ytmukai requested review from paulwalker-arm and rj-jesus December 2, 2025 08:18

kinoshita-fj force-pushed the feature/disable-lowering-unpredicated-load-store-as-ldr-str branch from 3ef0399 to 6d234a0 Compare December 2, 2025 08:34

kinoshita-fj changed the title ~~Add SubtargetFeature to disable lowering unpredicated loads/stores as…~~ [AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… Dec 2, 2025

kinoshita-fj force-pushed the feature/disable-lowering-unpredicated-load-store-as-ldr-str branch from 6d234a0 to 1550102 Compare December 2, 2025 11:00

paulwalker-arm reviewed Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256

[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256

Uh oh!

kinoshita-fj commented Dec 2, 2025

Uh oh!

llvmbot commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

paulwalker-arm Dec 2, 2025 •

edited

Loading

Uh oh!

paulwalker-arm Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;

		def AllowUnpredicatedLdStLower

[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256

Are you sure you want to change the base?

[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256

Uh oh!

Conversation

kinoshita-fj commented Dec 2, 2025

Uh oh!

llvmbot commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

paulwalker-arm Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 2, 2025 •

edited

Loading

paulwalker-arm Dec 2, 2025 •

edited

Loading

paulwalker-arm Dec 2, 2025 •

edited

Loading