-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[X86] Remove TuningPOPCNTFalseDeps from AlderLake #154004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Remove TuningPOPCNTFalseDeps from AlderLake #154004
Conversation
This false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in llvm#153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources. None of the other CPUs that have had this fixed have the tuning flag. Fixes llvm#153983.
|
@llvm/pr-subscribers-backend-x86 Author: Aiden Grossman (boomanaiden154) ChangesThis false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in #153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources. None of the other CPUs that have had this fixed have the tuning flag. Fixes #153983. Full diff: https://github.com/llvm/llvm-project/pull/154004.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 990b381341f07..3d34ea3bed318 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -1291,7 +1291,9 @@ def ProcessorFeatures {
list<SubtargetFeature> ADLAdditionalTuning = [TuningPERMFalseDeps,
TuningPreferMovmskOverVTest,
TuningFastImmVectorShift];
- list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);
+ list<SubtargetFeature> ADLRemoveTuning = [TuningPOPCNTFalseDeps];
+ list<SubtargetFeature> ADLTuning =
+ !listremove(!listconcat(SKLTuning, ADLAdditionalTuning), ADLRemoveTuning);
list<SubtargetFeature> ADLFeatures =
!listconcat(TRMFeatures, ADLAdditionalFeatures);
diff --git a/llvm/test/CodeGen/X86/bitcnt-false-dep.ll b/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
index 5f576c8586285..793cbb8f75bdc 100644
--- a/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
+++ b/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
@@ -1,6 +1,7 @@
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell | FileCheck %s --check-prefix=HSW
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake | FileCheck %s --check-prefix=SKL
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skx | FileCheck %s --check-prefix=SKL
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=alderlake | FileCheck %s --check-prefix=ADL
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=silvermont -mattr=+lzcnt,+bmi | FileCheck %s --check-prefix=SKL
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=goldmont -mattr=+lzcnt,+bmi | FileCheck %s --check-prefix=SKL
@@ -37,6 +38,10 @@ ret:
;SKL-LABEL:@loopdep_popcnt32
;SKL: xorl [[GPR0:%e[a-d]x]], [[GPR0]]
;SKL-NEXT: popcntl {{.*}}, [[GPR0]]
+
+;ADL-LABEL:@loopdep_popcnt32
+;ADL-NOT: xor
+;ADL: popcntl
}
define i64 @loopdep_popcnt64(ptr nocapture %x, ptr nocapture %y) nounwind {
@@ -63,6 +68,10 @@ ret:
;SKL-LABEL:@loopdep_popcnt64
;SKL: xorl %e[[GPR0:[a-d]x]], %e[[GPR0]]
;SKL-NEXT: popcntq {{.*}}, %r[[GPR0]]
+
+;ADL-LABEL:@loopdep_popcnt64
+;ADL-NOT: xor
+;ADL: popcntq
}
define i32 @loopdep_tzct32(ptr nocapture %x, ptr nocapture %y) nounwind {
|
|
Could you please take a look if that is also true for TuningLZCNTFalseDeps? I've applied the patch below locally which also implements [FeatureERMSB, FeatureFSRM] for Raptor Lake and it seems to perform fine, but that might be an idea for another patch. |
|
@ms178 afaik, ADLTuning doesn’t have TuningLZCNTFalseDeps in the first place, since SKLTuning doesn’t have that either: llvm-project/llvm/lib/Target/X86/X86.td Line 981 in 66a2d1b
And to answer your question about whether Alderlake has lzcnt/tzcnt false deps: it’s the same as with popcnt: only for r/m16 |
@sharkautarch Maybe I am reading it wrong, but doesn't Syklake inherit TuningLZCNTFalseDeps via Haswell and Broadwell tuning, hence it gets passed down to ADLTuning as well at the moment? At least TuningLZCNTFalseDeps is set in the Haswell section ( llvm-project/llvm/lib/Target/X86/X86.td Line 950 in 66a2d1b
|
@ms178 llvm-project/llvm/lib/Target/X86/X86.td Line 1294 in 66a2d1b
llvm-project/llvm/lib/Target/X86/X86.td Line 971 in 66a2d1b
|
phoebewang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one minor
| ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell | FileCheck %s --check-prefix=HSW | ||
| ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake | FileCheck %s --check-prefix=SKL | ||
| ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skx | FileCheck %s --check-prefix=SKL | ||
| ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=alderlake | FileCheck %s --check-prefix=ADL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this file be generated with the update script and we share more prefixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there are specific features that you have in mind for update_llc_test_checks.py that I'm unfamiliar with, but I don't think it makes a ton of sense here.
We end up having to assert the codegen for the entire function rather than just the bit counting instructions/xors that we're interested in. The new ADL lines will also still be unique as well.
Happy to follow up postcommit if there is a simpler way to do this.
This false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in #153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources.
None of the other CPUs that have had this fixed have the tuning flag.
Fixes #153983.