-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AArch64][SVE] Add SubtargetFeature to disable lowering unpredicated loads/stores as… #170256
Conversation
|
@llvm/pr-subscribers-backend-aarch64 Author: Kinoshita Kotaro (kinoshita-fj) Changes… LDR/STR PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression. Full diff: https://github.com/llvm/llvm-project/pull/170256.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 066724bea92c9..f1baaf82195f9 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -915,6 +915,10 @@ def FeatureUseWzrToVecMove : SubtargetFeature<"use-wzr-to-vec-move",
"UseWzrToVecMove", "true",
"Move from WZR to insert 0 into vector registers">;
+def FeatureDisableUnpredicatedLdStLower : SubtargetFeature<
+ "disable-unpredicated-ld-st-lower", "DisableUnpredicatedLdStLower",
+ "true", "Disable lowering unpredicated loads/stores as LDR/STR">;
+
//===----------------------------------------------------------------------===//
// Architectures.
//
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index da93a2b13fc11..5490ee7201f3b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -443,6 +443,8 @@ def AllowMisalignedMemAccesses
def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">;
+def AllowUnpredicatedLdStLower
+ : Predicate<"!Subtarget->disableUnpredicatedLdStLower()">;
//===----------------------------------------------------------------------===//
// AArch64-specific DAG Nodes.
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 120415f91c9ae..72882ac078c55 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -306,7 +306,8 @@ def TuneA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX",
FeatureAggressiveFMA,
FeatureArithmeticBccFusion,
FeatureStorePairSuppress,
- FeaturePredictableSelectIsExpensive]>;
+ FeaturePredictableSelectIsExpensive,
+ FeatureDisableUnpredicatedLdStLower]>;
def TuneMONAKA : SubtargetFeature<"fujitsu-monaka", "ARMProcFamily", "MONAKA",
"Fujitsu FUJITSU-MONAKA processors", [
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index e99b3f8ff07e0..4d549c6c55d17 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3164,7 +3164,7 @@ let Predicates = [HasSVE_or_SME] in {
}
// Allow using LDR/STR to avoid the predicate dependence.
- let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses] in
+ let Predicates = [HasSVE_or_SME, IsLE, AllowMisalignedMemAccesses, AllowUnpredicatedLdStLower] in
foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, nxv2f64, nxv8bf16 ] in {
let AddedComplexity = 2 in {
def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))),
diff --git a/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
new file mode 100644
index 0000000000000..dd654df2c2a5d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-disable-unpredicated-load-store-lower.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s
+; RUN: llc -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT: st1d { z0.d }, p0, [x1]
+; CHECK-NEXT: ret
+;
+; CHECK-DEFAULT-LABEL: nxv2i64:
+; CHECK-DEFAULT: // %bb.0:
+; CHECK-DEFAULT-NEXT: ldr z0, [x0]
+; CHECK-DEFAULT-NEXT: str z0, [x1]
+; CHECK-DEFAULT-NEXT: ret
+;
+; CHECK-A64FX-LABEL: nxv2i64:
+; CHECK-A64FX: // %bb.0:
+; CHECK-A64FX-NEXT: ptrue p0.d
+; CHECK-A64FX-NEXT: ld1d { z0.d }, p0/z, [x0]
+; CHECK-A64FX-NEXT: st1d { z0.d }, p0, [x1]
+; CHECK-A64FX-NEXT: ret
+ %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8
+ store <vscale x 2 x i64> %l3, ptr %stptr, align 8
+ ret void
+}
|
3ef0399 to
6d234a0
Compare
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
…loads/stores as LDR/STR PR llvm#127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in llvm#127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.
6d234a0 to
1550102
Compare
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 | ||
| ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+disable-unpredicated-ld-st-lower < %s | FileCheck %s | ||
| ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix CHECK-DEFAULT %s | ||
| ; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=a64fx < %s | FileCheck --check-prefix CHECK-A64FX %s | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To increase the test coverage, rather than adding a new file, please can you add the extra RUN lines to llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll and llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll instead?
|
|
||
| def UseWzrToVecMove : Predicate<"Subtarget->useWzrToVecMove()">; | ||
|
|
||
| def AllowUnpredicatedLdStLower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but perhaps "AggressiveUseOfSVEFillSpillInstructions" and "DisableAggressiveUseOfSVEFillSpillInstructions"?
… LDR/STR
PR #127837 changed the lowering for unpredicated loads/stores to use LDR/STR instead of LD1/ST1. However, on some CPUs, such as A64FX, there is a performance difference between LD1/ST1 and LDR/STR. As a result, the lowering introduced in #127837 can cause a performance regression on these targets. This patch adds a SubtargetFeature to disable this lowering and prevent the regression.