[X86] Fix some values for Znver4 model #161405

NexusXe · 2025-09-30T17:07:28Z

This PR fixes a handful of latency and uop changes between Znver3 and Znver4 that were otherwise copied from Znver3.

Latency and uop values listed that matched Zen3 on uops.info were updated to those for Zen4.

Includes: BSF/BSR, DIV, TZCNT, CLMUL, PCMPISTRM, VALIGN, VPERM

github-actions · 2025-09-30T17:07:49Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-09-30T17:08:25Z

@llvm/pr-subscribers-backend-x86

Author: None (NexusXe)

Changes

This PR fixes a handful of latency and uop changes between Znver3 and Znver4 that were otherwise copied from Znver3.

Latency and uop values listed that matched Zen3 on uops.info were updated to those for Zen4.

Full diff: https://github.com/llvm/llvm-project/pull/161405.diff

1 Files Affected:

(modified) llvm/lib/Target/X86/X86ScheduleZnver4.td (+40-39)

diff --git a/llvm/lib/Target/X86/X86ScheduleZnver4.td b/llvm/lib/Target/X86/X86ScheduleZnver4.td
index cc300548a50e6..405c0a2e9fbd1 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver4.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver4.td
@@ -15,7 +15,7 @@
 //===----------------------------------------------------------------------===//
 
 def Znver4Model : SchedMachineModel {
-  // AMD SOG Zen4, 2.9.6 Dispatch
+  // AMD SOG Zen4, 2.9.8 Dispatch
   // The processor may dispatch up to 6 macro ops per cycle
   // into the execution engine.
   let IssueWidth = 6;
@@ -46,8 +46,9 @@ def Znver4Model : SchedMachineModel {
   int VecLoadLatency = 7;
   // Latency of a simple store operation.
   int StoreLatency = 1;
-  // FIXME:
-  let HighLatency = 25; // FIXME: any better choice?
+  // Mean and median value for all instructions with latencies >6
+  // Source: Zen4 Instruction Latencies spreadsheet (included with SOG)
+  let HighLatency = 13;
   // AMD SOG Zen4, 2.8 Optimizing Branching
   // The branch misprediction penalty is in the range from 11 to 18 cycles,
   // <...>. The common case penalty is 13 cycles.
@@ -585,10 +586,11 @@ def : InstRW<[Zn4WriteADC8mr_SBB8mr], (instrs ADC8mr, SBB8mr)>;
 defm : Zn4WriteResInt<WriteLEA, [Zn4AGU012], 1, [1], 1>;     // LEA instructions can't fold loads.
 
 // This write is used for slow LEA instructions.
+// values from uops.info
 def Zn4Write3OpsLEA : SchedWriteRes<[Zn4ALU0123]> {
-  let Latency = 2;
-  let ReleaseAtCycles = [1];
-  let NumMicroOps = 2;
+  let Latency = 3;
+  let ReleaseAtCycles = [1, 1, 1, 2];
+  let NumMicroOps = 4;
 }
 
 // On Znver4, a slow LEA is either a 3Ops LEA (base, index, offset),
@@ -612,9 +614,10 @@ def Zn4WriteLEA : SchedWriteVariant<[
 
 def : InstRW<[Zn4WriteLEA], (instrs LEA32r, LEA64r, LEA64_32r)>;
 
+// values from uops.info
 def Zn4SlowLEA16r : SchedWriteRes<[Zn4ALU0123]> {
-  let Latency = 2; // FIXME: not from llvm-exegesis
-  let ReleaseAtCycles = [4];
+  let Latency = 2;
+  let ReleaseAtCycles = [1, 1];
   let NumMicroOps = 2;
 }
 
@@ -660,14 +663,14 @@ def : InstRW<[Zn4WriteCMPXCHG8rm_LCMPXCHG8], (instrs CMPXCHG8rm, LCMPXCHG8)>;
 def Zn4WriteCMPXCHG8B : SchedWriteRes<[Zn4ALU0123]> {
   let Latency = 3; // FIXME: not from llvm-exegesis
   let ReleaseAtCycles = [24];
-  let NumMicroOps = 19;
+  let NumMicroOps = 15;
 }
 def : InstRW<[Zn4WriteCMPXCHG8B], (instrs CMPXCHG8B)>;
 
 def Zn4WriteCMPXCHG16B_LCMPXCHG16B : SchedWriteRes<[Zn4ALU0123]> {
   let Latency = 4; // FIXME: not from llvm-exegesis
   let ReleaseAtCycles = [59];
-  let NumMicroOps = 28;
+  let NumMicroOps = 26;
 }
 def : InstRW<[Zn4WriteCMPXCHG16B_LCMPXCHG16B], (instrs CMPXCHG16B, LCMPXCHG16B)>;
 
@@ -681,7 +684,7 @@ def : InstRW<[Zn4WriteWriteXCHGUnrenameable], (instrs XCHG8rr, XCHG16rr, XCHG16a
 def Zn4WriteXCHG8rm_XCHG16rm : SchedWriteRes<[Zn4AGU012, Zn4Load, Zn4ALU0123]> {
   let Latency = !add(Znver4Model.LoadLatency, 3); // FIXME: not from llvm-exegesis
   let ReleaseAtCycles = [1, 1, 2];
-  let NumMicroOps = 5;
+  let NumMicroOps = 2;
 }
 def : InstRW<[Zn4WriteXCHG8rm_XCHG16rm], (instrs XCHG8rm, XCHG16rm)>;
 
@@ -693,19 +696,17 @@ def Zn4WriteXCHG32rm_XCHG64rm : SchedWriteRes<[Zn4AGU012, Zn4Load, Zn4ALU0123]>
 def : InstRW<[Zn4WriteXCHG32rm_XCHG64rm], (instrs XCHG32rm, XCHG64rm)>;
 
 // Integer division.
-// FIXME: uops for 8-bit division measures as 2. for others it's a guess.
-// FIXME: latency for 8-bit division measures as 10. for others it's a guess.
-defm : Zn4WriteResIntPair<WriteDiv8, [Zn4Divider], 10, [10], 2>;
-defm : Zn4WriteResIntPair<WriteDiv16, [Zn4Divider], 11, [11], 2>;
-defm : Zn4WriteResIntPair<WriteDiv32, [Zn4Divider], 13, [13], 2>;
-defm : Zn4WriteResIntPair<WriteDiv64, [Zn4Divider], 17, [17], 2>;
-defm : Zn4WriteResIntPair<WriteIDiv8, [Zn4Divider], 10, [10], 2>;
-defm : Zn4WriteResIntPair<WriteIDiv16, [Zn4Divider], 11, [11], 2>;
-defm : Zn4WriteResIntPair<WriteIDiv32, [Zn4Divider], 13, [13], 2>;
-defm : Zn4WriteResIntPair<WriteIDiv64, [Zn4Divider], 17, [17], 2>;
-
-defm : Zn4WriteResIntPair<WriteBSF, [Zn4ALU1], 1, [1], 6, /*LoadUOps=*/1>; // Bit scan forward.
-defm : Zn4WriteResIntPair<WriteBSR, [Zn4ALU1], 1, [1], 6, /*LoadUOps=*/1>; // Bit scan reverse.
+defm : Zn4WriteResIntPair<WriteDiv8, [Zn4Divider], 9, [9], 2>;
+defm : Zn4WriteResIntPair<WriteDiv16, [Zn4Divider], 10, [10], 2>;
+defm : Zn4WriteResIntPair<WriteDiv32, [Zn4Divider], 12, [12], 2>;
+defm : Zn4WriteResIntPair<WriteDiv64, [Zn4Divider], 18, [18], 2>;
+defm : Zn4WriteResIntPair<WriteIDiv8, [Zn4Divider], 9, [9], 2>;
+defm : Zn4WriteResIntPair<WriteIDiv16, [Zn4Divider], 10, [10], 2>;
+defm : Zn4WriteResIntPair<WriteIDiv32, [Zn4Divider], 12, [12], 2>;
+defm : Zn4WriteResIntPair<WriteIDiv64, [Zn4Divider], 18, [18], 2>;
+
+defm : Zn4WriteResIntPair<WriteBSF, [Zn4ALU1], 1, [1], 1, /*LoadUOps=*/1>; // Bit scan forward.
+defm : Zn4WriteResIntPair<WriteBSR, [Zn4ALU1], 1, [1], 1, /*LoadUOps=*/1>; // Bit scan reverse.
 
 defm : Zn4WriteResIntPair<WritePOPCNT, [Zn4ALU0123], 1, [1], 1>; // Bit population count.
 
@@ -725,12 +726,12 @@ def Zn4WriteLZCNT16rr : SchedWriteRes<[Zn4ALU0123]> {
 }
 def : InstRW<[Zn4WriteLZCNT16rr], (instrs LZCNT16rr)>;
 
-defm : Zn4WriteResIntPair<WriteTZCNT, [Zn4ALU12], 2, [1], 2>; // Trailing zero count.
+defm : Zn4WriteResIntPair<WriteTZCNT, [Zn4ALU12], 1, [1], 1>; // Trailing zero count.
 
 def Zn4WriteTZCNT16rr : SchedWriteRes<[Zn4ALU0123]> {
-  let Latency = 2;
-  let ReleaseAtCycles = [4];
-  let NumMicroOps = 2;
+  let Latency = 1;
+  let ReleaseAtCycles = [1];
+  let NumMicroOps = 1;
 }
 def : InstRW<[Zn4WriteTZCNT16rr], (instrs TZCNT16rr)>;
 
@@ -1326,9 +1327,9 @@ def : InstRW<[Zn4WriteSHA256RNDS2rr], (instrs SHA256RNDS2rr)>;
 
 // Strings instructions.
 // Packed Compare Implicit Length Strings, Return Mask
-defm : Zn4WriteResXMMPair<WritePCmpIStrM, [Zn4FPVAdd0123], 6, [8], 3, /*LoadUOps=*/1>;
+defm : Zn4WriteResXMMPair<WritePCmpIStrM, [Zn4FPVAdd0123], 7, [8], 3, /*LoadUOps=*/1>;
 // Packed Compare Explicit Length Strings, Return Mask
-defm : Zn4WriteResXMMPair<WritePCmpEStrM, [Zn4FPVAdd0123], 6, [12], 7, /*LoadUOps=*/5>;
+defm : Zn4WriteResXMMPair<WritePCmpEStrM, [Zn4FPVAdd0123], 7, [12], 7, /*LoadUOps=*/5>;
 // Packed Compare Implicit Length Strings, Return Index
 defm : Zn4WriteResXMMPair<WritePCmpIStrI, [Zn4FPVAdd0123], 2, [8], 4>;
 // Packed Compare Explicit Length Strings, Return Index
@@ -1340,7 +1341,7 @@ defm : Zn4WriteResXMMPair<WriteAESIMC, [Zn4FPAES01], 4, [1], 1>; // InvMixColumn
 defm : Zn4WriteResXMMPair<WriteAESKeyGen, [Zn4FPAES01], 4, [1], 1>; // Key Generation.
 
 // Carry-less multiplication instructions.
-defm : Zn4WriteResXMMPair<WriteCLMul, [Zn4FPCLM01], 4, [4], 4>;
+defm : Zn4WriteResXMMPair<WriteCLMul, [Zn4FPCLM01], 4, [3], 4>;
 
 // EMMS/FEMMS
 defm : Zn4WriteResInt<WriteEMMS, [Zn4ALU0123], 2, [1], 1>; // FIXME: latency not from llvm-exegesis
@@ -1386,23 +1387,23 @@ def Zn4WriteVPERM2F128rm : SchedWriteRes<[Zn4AGU012, Zn4Load, Zn4FPVShuf]> {
 def : InstRW<[Zn4WriteVPERM2F128rm], (instrs VPERM2F128rmi)>;
 
 def Zn4WriteVPERMPSYrr : SchedWriteRes<[Zn4FPVShuf]> {
-  let Latency = 7;
+  let Latency = 4;
   let ReleaseAtCycles = [1];
-  let NumMicroOps = 2;
+  let NumMicroOps = 1;
 }
 def : InstRW<[Zn4WriteVPERMPSYrr], (instrs VPERMPSYrr)>;
 
 def Zn4WriteVPERMPSYrm : SchedWriteRes<[Zn4AGU012, Zn4Load, Zn4FPVShuf]> {
   let Latency = !add(Znver4Model.VecLoadLatency, Zn4WriteVPERMPSYrr.Latency);
-  let ReleaseAtCycles = [1, 1, 2];
-  let NumMicroOps = !add(Zn4WriteVPERMPSYrr.NumMicroOps, 1);
+  let ReleaseAtCycles = [1];
+  let NumMicroOps = 1;
 }
 def : InstRW<[Zn4WriteVPERMPSYrm], (instrs VPERMPSYrm)>;
 
 def Zn4WriteVPERMYri : SchedWriteRes<[Zn4FPVShuf]> {
-  let Latency = 6;
+  let Latency = 4;
   let ReleaseAtCycles = [1];
-  let NumMicroOps = 2;
+  let NumMicroOps = 1;
 }
 def : InstRW<[Zn4WriteVPERMYri], (instrs VPERMPDYri, VPERMQYri)>;
 
@@ -1414,9 +1415,9 @@ def Zn4WriteVPERMPDYmi : SchedWriteRes<[Zn4AGU012, Zn4Load, Zn4FPVShuf]> {
 def : InstRW<[Zn4WriteVPERMPDYmi], (instrs VPERMPDYmi)>;
 
 def Zn4WriteVPERMDYrr : SchedWriteRes<[Zn4FPVShuf]> {
-  let Latency = 5;
+  let Latency = 4;
   let ReleaseAtCycles = [1];
-  let NumMicroOps = 2;
+  let NumMicroOps = 1;
 }
 def : InstRW<[Zn4WriteVPERMDYrr], (instrs VPERMDYrr)>;

RKSimon

I expect theres some llvm-mca changes. Try ninja check-llvm-tools-llvm-mca-x86

RKSimon

missing llvm-mca test changes - please try llvm-project\llvm\utils\update_mca_test_checks.py

NexusXe · 2025-10-01T17:22:54Z

I'm going to go through the updated generated values and make sure they match up w/ public measured values, will update

NexusXe · 2025-10-01T19:33:25Z

I did not mean to request a review from everyone! It was just a rebase!

I apologize for any inconvenience this may have caused. I was trying to rebase this PR to make absolutely sure everything built and the llvm-mca test passed, and I'm fairly certain I did it wrong

RKSimon

LGTM

github-actions · 2025-10-22T12:10:07Z

@NexusXe Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

This PR fixes a handful of latency and uop changes between Znver3 and Znver4 that were otherwise copied from Znver3. Latency and uop values listed that matched Zen3 on uops.info were updated to those for Zen4. Includes: BSF/BSR, DIV, TZCNT, CLMUL, PCMPISTRM, VALIGN, VPERM

llvmbot added the backend:X86 label Sep 30, 2025

nikic requested a review from RKSimon September 30, 2025 18:32

RKSimon reviewed Sep 30, 2025

View reviewed changes

RKSimon requested changes Oct 1, 2025

View reviewed changes

NexusXe added 20 commits October 1, 2025 14:15

fix documentation reference

fe00085

better HighLatency value

092492a

LEA metrics

bc45c69

(CMP)XCHG uops

fd6f87f

(I)DIV values from docs/uops.info

a67c252

use zen4 BSF/BSR uops count

cc5fc0d

TZCNT in 1 uop

3555807

some higher latency string instructions

48d84d2

use Zen4 CLMUL/VPERM(S/D) values

8d345dd

replace FIXME for latency

d1bf965

revert multiop ReleaseAtCycles value

de77a33

fix RAC for Zn4WriteVPERMPSYrm

de0daad

fix RAC for Zn4Write3OpsLEA

a6621ca

regenerate resource files

603d9b8

revert LEA changes

de26eec

CMPXCHG8B better TP

ceec220

VPERMD uops

3f10cb4

fix CMPXCHG16B

02d554c

VALIGN has different latency depending on width

e4c7852

update resource files

d300f5e

NexusXe requested review from aaronmondal, keith and rupprecht as code owners October 1, 2025 19:32

NexusXe requested review from a team, JDevlieghere, andykaylor, ayermolo, bcardosolopes, ftynse, lanza, makslevental, nicolasvasilache, nikic, paschalis-mpeis, rafaelauler, rolfmorel, stellaraccident, xlauko, yota9 and yozhu as code owners October 1, 2025 19:32

NexusXe marked this pull request as draft October 1, 2025 19:37

NexusXe force-pushed the znver4-work branch from 3ee6f4e to d300f5e Compare October 1, 2025 19:49

NexusXe requested a review from RKSimon October 1, 2025 19:56

Merge branch 'main' into znver4-work

0136fd2

NexusXe marked this pull request as ready for review October 1, 2025 20:31

Merge branch 'main' into znver4-work

af4309e

RKSimon approved these changes Oct 22, 2025

View reviewed changes

RKSimon merged commit 37fcaf5 into llvm:main Oct 22, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Fix some values for Znver4 model #161405

[X86] Fix some values for Znver4 model #161405

Uh oh!

NexusXe commented Sep 30, 2025 •

edited by RKSimon

Loading

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

llvmbot commented Sep 30, 2025

Uh oh!

RKSimon left a comment

Uh oh!

RKSimon left a comment

Uh oh!

NexusXe commented Oct 1, 2025

Uh oh!

NexusXe commented Oct 1, 2025 •

edited

Loading

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[X86] Fix some values for Znver4 model #161405

[X86] Fix some values for Znver4 model #161405

Uh oh!

Conversation

NexusXe commented Sep 30, 2025 • edited by RKSimon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

llvmbot commented Sep 30, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

NexusXe commented Oct 1, 2025

Uh oh!

NexusXe commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NexusXe commented Sep 30, 2025 •

edited by RKSimon

Loading

NexusXe commented Oct 1, 2025 •

edited

Loading