Skip to content

Conversation

@boomanaiden154
Copy link
Contributor

This false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in #153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources.

None of the other CPUs that have had this fixed have the tuning flag.

Fixes #153983.

This false dependency issue was fixed in CannonLake looking at the data
from uops.info. This is confirmed not to be an issue based on
benchmarking data in llvm#153983. Setting this can potentially lead to extra
xor instructions whihc could consume extra frontend/renaming resources.

None of the other CPUs that have had this fixed have the tuning flag.

Fixes llvm#153983.
@llvmbot
Copy link
Member

llvmbot commented Aug 17, 2025

@llvm/pr-subscribers-backend-x86

Author: Aiden Grossman (boomanaiden154)

Changes

This false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in #153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources.

None of the other CPUs that have had this fixed have the tuning flag.

Fixes #153983.


Full diff: https://github.com/llvm/llvm-project/pull/154004.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86.td (+3-1)
  • (modified) llvm/test/CodeGen/X86/bitcnt-false-dep.ll (+9)
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 990b381341f07..3d34ea3bed318 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -1291,7 +1291,9 @@ def ProcessorFeatures {
   list<SubtargetFeature> ADLAdditionalTuning = [TuningPERMFalseDeps,
                                                 TuningPreferMovmskOverVTest,
                                                 TuningFastImmVectorShift];
-  list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);
+  list<SubtargetFeature> ADLRemoveTuning = [TuningPOPCNTFalseDeps];
+  list<SubtargetFeature> ADLTuning =
+      !listremove(!listconcat(SKLTuning, ADLAdditionalTuning), ADLRemoveTuning);
   list<SubtargetFeature> ADLFeatures =
     !listconcat(TRMFeatures, ADLAdditionalFeatures);
 
diff --git a/llvm/test/CodeGen/X86/bitcnt-false-dep.ll b/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
index 5f576c8586285..793cbb8f75bdc 100644
--- a/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
+++ b/llvm/test/CodeGen/X86/bitcnt-false-dep.ll
@@ -1,6 +1,7 @@
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell | FileCheck %s --check-prefix=HSW
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake | FileCheck %s --check-prefix=SKL
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skx | FileCheck %s --check-prefix=SKL
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=alderlake | FileCheck %s --check-prefix=ADL
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=silvermont -mattr=+lzcnt,+bmi | FileCheck %s --check-prefix=SKL
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=goldmont -mattr=+lzcnt,+bmi | FileCheck %s --check-prefix=SKL
 
@@ -37,6 +38,10 @@ ret:
 ;SKL-LABEL:@loopdep_popcnt32
 ;SKL: xorl [[GPR0:%e[a-d]x]], [[GPR0]]
 ;SKL-NEXT: popcntl {{.*}}, [[GPR0]]
+
+;ADL-LABEL:@loopdep_popcnt32
+;ADL-NOT: xor
+;ADL: popcntl
 }
 
 define i64 @loopdep_popcnt64(ptr nocapture %x, ptr nocapture %y) nounwind {
@@ -63,6 +68,10 @@ ret:
 ;SKL-LABEL:@loopdep_popcnt64
 ;SKL: xorl %e[[GPR0:[a-d]x]], %e[[GPR0]]
 ;SKL-NEXT: popcntq {{.*}}, %r[[GPR0]]
+
+;ADL-LABEL:@loopdep_popcnt64
+;ADL-NOT: xor
+;ADL: popcntq
 }
 
 define i32 @loopdep_tzct32(ptr nocapture %x, ptr nocapture %y) nounwind {

@ms178
Copy link

ms178 commented Aug 17, 2025

Could you please take a look if that is also true for TuningLZCNTFalseDeps? I've applied the patch below locally which also implements [FeatureERMSB, FeatureFSRM] for Raptor Lake and it seems to perform fine, but that might be an idea for another patch.

--- a/llvm/lib/Target/X86/X86.td	2025-08-17 08:57:51.604276317 +0200
+++ b/llvm/lib/Target/X86/X86.td	2025-08-17 09:07:22.804296154 +0200
@@ -1291,10 +1292,23 @@ def ProcessorFeatures {
   list<SubtargetFeature> ADLAdditionalTuning = [TuningPERMFalseDeps,
                                                 TuningPreferMovmskOverVTest,
                                                 TuningFastImmVectorShift];
-  list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);
+  // Remove inherited false-dependency tunings that do not apply on Alder/Raptor Lake
+  list<SubtargetFeature> ADLRemoveTuning = [TuningPOPCNTFalseDeps,
+                                            TuningLZCNTFalseDeps];
+  list<SubtargetFeature> ADLTuning =
+    !listremove(!listconcat(SKLTuning, ADLAdditionalTuning), ADLRemoveTuning);
   list<SubtargetFeature> ADLFeatures =
     !listconcat(TRMFeatures, ADLAdditionalFeatures);
 
+  // Raptor Lake (client, no AVX-512): enable fast string ops + prefer 128-bit compute
+  list<SubtargetFeature> RLAdditionalFeatures = [FeatureERMSB, FeatureFSRM];
+  list<SubtargetFeature> RLFeatures =
+    !listconcat(ADLFeatures, RLAdditionalFeatures);
+
+  list<SubtargetFeature> RLAdditionalTuning = [TuningPrefer128Bit];
+  list<SubtargetFeature> RLTuning =
+    !listconcat(ADLTuning, RLAdditionalTuning);
+
   // Gracemont
   list<SubtargetFeature> GRTTuning = [TuningMacroFusion,
                                       TuningSlow3OpsLEA,
@@ -1866,7 +1880,7 @@ foreach P = ["sierraforest", "grandridge
                 ProcessorFeatures.GRTTuning>;
 }
 def : ProcModel<"raptorlake", AlderlakePModel,
-                ProcessorFeatures.ADLFeatures, ProcessorFeatures.ADLTuning>;
+                ProcessorFeatures.RLFeatures, ProcessorFeatures.RLTuning>;
 def : ProcModel<"meteorlake", AlderlakePModel,
                 ProcessorFeatures.ADLFeatures, ProcessorFeatures.ADLTuning>;
 def : ProcModel<"arrowlake", AlderlakePModel,

@sharkautarch
Copy link

@ms178 afaik, ADLTuning doesn’t have TuningLZCNTFalseDeps in the first place, since SKLTuning doesn’t have that either:

TuningPOPCNTFalseDeps,

And to answer your question about whether Alderlake has lzcnt/tzcnt false deps: it’s the same as with popcnt: only for r/m16

@ms178
Copy link

ms178 commented Aug 17, 2025

TuningLZCNTFalseDeps

@sharkautarch Maybe I am reading it wrong, but doesn't Syklake inherit TuningLZCNTFalseDeps via Haswell and Broadwell tuning, hence it gets passed down to ADLTuning as well at the moment? At least TuningLZCNTFalseDeps is set in the Haswell section (

TuningLZCNTFalseDeps,
) and cannot see it getting unset in the later models.

@sharkautarch
Copy link

sharkautarch commented Aug 17, 2025

ssed down to ADLTuning as well at the moment? At least TuningLZCNTFalseDeps is set in the Haswell section (

@ms178
That's the SKLFeatures that inherits from BDWFeatures, which in turn inherits from HSWFeatures, and so on
But ADLTuning does not include anything from SKLFeatures, it only inherits from SKLTuning, which is an entirely separate thing
And SKLTuning does not inherit tunings from any other field

list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);

list<SubtargetFeature> SKLTuning = [TuningFastGather,

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor

; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell | FileCheck %s --check-prefix=HSW
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake | FileCheck %s --check-prefix=SKL
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skx | FileCheck %s --check-prefix=SKL
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=alderlake | FileCheck %s --check-prefix=ADL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file be generated with the update script and we share more prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there are specific features that you have in mind for update_llc_test_checks.py that I'm unfamiliar with, but I don't think it makes a ton of sense here.

We end up having to assert the codegen for the entire function rather than just the bit counting instructions/xors that we're interested in. The new ADL lines will also still be unique as well.

Happy to follow up postcommit if there is a simpler way to do this.

@boomanaiden154 boomanaiden154 merged commit 1650e4a into llvm:main Aug 18, 2025
11 checks passed
@boomanaiden154 boomanaiden154 deleted the x86-popcount-alderlake-false-dep-fixed branch August 18, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[X86][BreakFalseDeps] Some x86 cpus only have popcnt false dep for popcnt r16, r/m16

6 participants