Skip to content

[AArch64] Remove wrong processor feature #151289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tomershafir
Copy link
Contributor

@tomershafir tomershafir commented Jul 30, 2025

fmov dX, dY is not a preferred instruction.

Previously introduced by: #144152

@tomershafir tomershafir force-pushed the aarch64-remove-wrong-processor-feature branch from f76a5c5 to 30f404e Compare July 30, 2025 07:38
@tomershafir tomershafir marked this pull request as ready for review July 30, 2025 08:43
@llvmbot
Copy link
Member

llvmbot commented Jul 30, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/151289.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (-10)
  • (removed) llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-fpr.ll (-103)
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index adc984ad795af..fa037df531028 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -318,7 +318,6 @@ def TuneAppleA7  : SubtargetFeature<"apple-a7", "ARMProcFamily", "AppleA7",
                                     FeatureFuseAES, FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing,
                                     FeatureZCZeroingFPWorkaround]>;
 
@@ -332,7 +331,6 @@ def TuneAppleA10 : SubtargetFeature<"apple-a10", "ARMProcFamily", "AppleA10",
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA11 : SubtargetFeature<"apple-a11", "ARMProcFamily", "AppleA11",
@@ -345,7 +343,6 @@ def TuneAppleA11 : SubtargetFeature<"apple-a11", "ARMProcFamily", "AppleA11",
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA12 : SubtargetFeature<"apple-a12", "ARMProcFamily", "AppleA12",
@@ -358,7 +355,6 @@ def TuneAppleA12 : SubtargetFeature<"apple-a12", "ARMProcFamily", "AppleA12",
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA13 : SubtargetFeature<"apple-a13", "ARMProcFamily", "AppleA13",
@@ -371,7 +367,6 @@ def TuneAppleA13 : SubtargetFeature<"apple-a13", "ARMProcFamily", "AppleA13",
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
@@ -389,7 +384,6 @@ def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA15 : SubtargetFeature<"apple-a15", "ARMProcFamily", "AppleA15",
@@ -407,7 +401,6 @@ def TuneAppleA15 : SubtargetFeature<"apple-a15", "ARMProcFamily", "AppleA15",
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
@@ -425,7 +418,6 @@ def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
@@ -443,7 +435,6 @@ def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
-                                    FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
 def TuneAppleM4 : SubtargetFeature<"apple-m4", "ARMProcFamily", "AppleM4",
@@ -460,7 +451,6 @@ def TuneAppleM4 : SubtargetFeature<"apple-m4", "ARMProcFamily", "AppleM4",
                                      FeatureFuseCryptoEOR,
                                      FeatureFuseLiterals,
                                      FeatureZCRegMoveGPR64,
-                                     FeatureZCRegMoveFPR64,
                                      FeatureZCZeroing
                                      ]>;
 
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-fpr.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-fpr.ll
deleted file mode 100644
index f422f96f33495..0000000000000
--- a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-fpr.ll
+++ /dev/null
@@ -1,103 +0,0 @@
-; RUN: llc < %s -mtriple=arm64-linux-gnu | FileCheck %s -check-prefixes=NOTCPU-LINUX --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=generic | FileCheck %s -check-prefixes=NOTCPU-APPLE --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 | FileCheck %s -check-prefixes=CPU --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 -mattr=-zcm-fpr64 | FileCheck %s -check-prefixes=NOTATTR --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mattr=+zcm-fpr64 | FileCheck %s -check-prefixes=ATTR --match-full-lines
-
-define void @zero_cycle_regmov_FPR32(float %a, float %b, float %c, float %d) {
-entry:
-; CHECK-LABEL: t:
-; NOTCPU-LINUX: fmov s0, s2
-; NOTCPU-LINUX: fmov s1, s3
-; NOTCPU-LINUX: fmov [[REG2:s[0-9]+]], s3
-; NOTCPU-LINUX: fmov [[REG1:s[0-9]+]], s2
-; NOTCPU-LINUX-NEXT: bl {{_?foo_float}}
-; NOTCPU-LINUX: fmov s0, [[REG1]]
-; NOTCPU-LINUX: fmov s1, [[REG2]]
-
-; NOTCPU-APPLE: fmov s0, s2
-; NOTCPU-APPLE: fmov s1, s3
-; NOTCPU-APPLE: fmov [[REG2:s[0-9]+]], s3
-; NOTCPU-APPLE: fmov [[REG1:s[0-9]+]], s2
-; NOTCPU-APPLE-NEXT: bl {{_?foo_float}}
-; NOTCPU-APPLE: fmov s0, [[REG1]]
-; NOTCPU-APPLE: fmov s1, [[REG2]]
-
-; CPU: fmov [[REG2:d[0-9]+]], d3
-; CPU: fmov [[REG1:d[0-9]+]], d2
-; CPU: fmov d0, d2
-; CPU: fmov d1, d3
-; CPU-NEXT: bl {{_?foo_float}}
-; CPU: fmov d0, [[REG1]]
-; CPU: fmov d1, [[REG2]]
-
-; NOTATTR: fmov [[REG2:s[0-9]+]], s3
-; NOTATTR: fmov [[REG1:s[0-9]+]], s2
-; NOTATTR: fmov s0, s2
-; NOTATTR: fmov s1, s3
-; NOTATTR-NEXT: bl {{_?foo_float}}
-; NOTATTR: fmov s0, [[REG1]]
-; NOTATTR: fmov s1, [[REG2]]
-
-; ATTR: fmov d0, d2
-; ATTR: fmov d1, d3
-; ATTR: fmov [[REG2:d[0-9]+]], d3
-; ATTR: fmov [[REG1:d[0-9]+]], d2
-; ATTR-NEXT: bl {{_?foo_float}}
-; ATTR: fmov d0, [[REG1]]
-; ATTR: fmov d1, [[REG2]]
-  %call = call float @foo_float(float %c, float %d)
-  %call1 = call float @foo_float(float %c, float %d)
-  unreachable
-}
-
-declare float @foo_float(float, float)
-
-define void @zero_cycle_regmov_FPR16(half %a, half %b, half %c, half %d) {
-entry:
-; CHECK-LABEL: t:
-; NOTCPU-LINUX: fmov s0, s2
-; NOTCPU-LINUX: fmov s1, s3
-; NOTCPU-LINUX: fmov [[REG2:s[0-9]+]], s3
-; NOTCPU-LINUX: fmov [[REG1:s[0-9]+]], s2
-; NOTCPU-LINUX-NEXT: bl {{_?foo_half}}
-; NOTCPU-LINUX: fmov s0, [[REG1]]
-; NOTCPU-LINUX: fmov s1, [[REG2]]
-
-; NOTCPU-APPLE: fmov s0, s2
-; NOTCPU-APPLE: fmov s1, s3
-; NOTCPU-APPLE: fmov [[REG2:s[0-9]+]], s3
-; NOTCPU-APPLE: fmov [[REG1:s[0-9]+]], s2
-; NOTCPU-APPLE-NEXT: bl {{_?foo_half}}
-; NOTCPU-APPLE: fmov s0, [[REG1]]
-; NOTCPU-APPLE: fmov s1, [[REG2]]
-
-; CPU: fmov [[REG2:d[0-9]+]], d3
-; CPU: fmov [[REG1:d[0-9]+]], d2
-; CPU: fmov d0, d2
-; CPU: fmov d1, d3
-; CPU-NEXT: bl {{_?foo_half}}
-; CPU: fmov d0, [[REG1]]
-; CPU: fmov d1, [[REG2]]
-
-; NOTATTR: fmov [[REG2:s[0-9]+]], s3
-; NOTATTR: fmov [[REG1:s[0-9]+]], s2
-; NOTATTR: fmov s0, s2
-; NOTATTR: fmov s1, s3
-; NOTATTR-NEXT: bl {{_?foo_half}}
-; NOTATTR: fmov s0, [[REG1]]
-; NOTATTR: fmov s1, [[REG2]]
-
-; ATTR: fmov d0, d2
-; ATTR: fmov d1, d3
-; ATTR: fmov [[REG2:d[0-9]+]], d3
-; ATTR: fmov [[REG1:d[0-9]+]], d2
-; ATTR-NEXT: bl {{_?foo_half}}
-; ATTR: fmov d0, [[REG1]]
-; ATTR: fmov d1, [[REG2]]
-  %call = call half @foo_half(half %c, half %d)
-  %call1 = call half @foo_half(half %c, half %d)
-  unreachable
-}
-
-declare half @foo_half(half, half)

@tomershafir tomershafir requested a review from jroelofs July 30, 2025 08:46
@DavidSpickett
Copy link
Collaborator

No opinion on the change itself but your PR description does need more detail. A link to the change that added it would be good too.

@tomershafir tomershafir force-pushed the aarch64-remove-wrong-processor-feature branch from 30f404e to 31d6c6f Compare July 30, 2025 10:12
@tomershafir
Copy link
Contributor Author

@DavidSpickett Done

@DavidSpickett
Copy link
Collaborator

Ok, better, but why is the feature wrong?

If I do a downstream merge and see that this change made my benchmarks go one way or the other, that's what I'd be looking for in the commit message. Perhaps the intent was good you just want to implement it differently, perhaps you found it made zero difference and doesn't need to be there, whatever the reason is.

@tomershafir
Copy link
Contributor Author

Done

@jroelofs Ping

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds OK to me if it how the hardware works and someone from Apple agrees. It might be worth keeping the testing though.

@@ -1,103 +0,0 @@
; RUN: llc < %s -mtriple=arm64-linux-gnu | FileCheck %s -check-prefixes=NOTCPU-LINUX --match-full-lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this loses all testing for the feature. Can we keep just the testing for default and +zcm-fpr64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, gonna keep NOTCPU-APPLE, NOTCPU-LINUX, ATTR

`fmov dX, dY` is not a preferred instruction.

Previously introduced by: llvm#144152
@tomershafir tomershafir force-pushed the aarch64-remove-wrong-processor-feature branch from 31d6c6f to ae5084a Compare August 10, 2025 17:51
@tomershafir tomershafir requested a review from davemgreen August 10, 2025 18:30
@jroelofs
Copy link
Contributor

No opinion on the change itself but your PR description does need more detail. A link to the change that added it would be good too.

I agree with David here, even after the edits. The description in the commit message needs to say "why" we are making the change, as in: why is this feature wrong and not preferred? That's something that often cannot be inferred from the "what" of the rest of the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants