Skip to content

Conversation

@boomanaiden154
Copy link
Contributor

This patch adds the appropriate hookups in X86PfmCounters.td for SapphireRapids. This is mostly to fix errors when some of my jobs that only really need dummy counters get scheduled on sapphire rapids machines, but figured I might as well do it properly while here. I do not have hardware access to test this currently, but this matches exactly with what is in the libpfm source code.

This patch adds the appropriate hookups in X86PfmCounters.td for
SapphireRapids. This is mostly to fix errors when some of my jobs that
only really need dummy counters get scheduled on sapphire rapids
machines, but figured I might as well do it properly while here. I do
not have hardware access to test this currently, but this matches
exactly with what is in the libpfm source code.
@llvmbot
Copy link
Member

llvmbot commented Oct 27, 2024

@llvm/pr-subscribers-backend-x86

Author: Aiden Grossman (boomanaiden154)

Changes

This patch adds the appropriate hookups in X86PfmCounters.td for SapphireRapids. This is mostly to fix errors when some of my jobs that only really need dummy counters get scheduled on sapphire rapids machines, but figured I might as well do it properly while here. I do not have hardware access to test this currently, but this matches exactly with what is in the libpfm source code.


Full diff: https://github.com/llvm/llvm-project/pull/113847.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86PfmCounters.td (+16)
  • (modified) llvm/lib/Target/X86/X86SchedSapphireRapids.td (+6)
diff --git a/llvm/lib/Target/X86/X86PfmCounters.td b/llvm/lib/Target/X86/X86PfmCounters.td
index c30e989cdc2af1..38d8d19091e0fd 100644
--- a/llvm/lib/Target/X86/X86PfmCounters.td
+++ b/llvm/lib/Target/X86/X86PfmCounters.td
@@ -220,6 +220,22 @@ def AlderLakePfmCounters : ProcPfmCounters {
 }
 def : PfmCountersBinding<"alderlake", AlderLakePfmCounters>;
 
+def SapphireRapidsPfmCounters : ProcPfmCounters {
+  let CycleCounter = UnhaltedCoreCyclesPfmCounter;
+  let UopsCounter = UopsIssuedPfmCounter;
+  let IssueCounters = [
+    PfmIssueCounter<"SPRPort00", "uops_dispatched_port:port_0">,
+    PfmIssueCounter<"SPRPort01", "uops_dispatched_port:port_1">,
+    PfmIssueCounter<"SPRPort02_03_10", "uops_dispatched_port:port_2_3_10">,
+    PfmIssueCounter<"SPRPort04_09", "uops_dispatched_port:port_4_9">,
+    PfmIssueCounter<"SPRPort05_11", "uops_dispatched_port:port_5_11">,
+    PfmIssueCounter<"SPRPort06", "uops_dispatched_port:port_6">,
+    PfmIssueCounter<"SPRPort07_08", "uops_dispatched_port:port_7_8">,
+  ];
+  let ValidationCounters = DefaultIntelPfmValidationCounters;
+}
+def : PfmCountersBinding<"sapphirerapids", SapphireRapidsPfmCounters>;
+
 // AMD X86 Counters.
 defvar DefaultAMDPfmValidationCounters = [
   PfmValidationCounter<InstructionRetired, "RETIRED_INSTRUCTIONS">,
diff --git a/llvm/lib/Target/X86/X86SchedSapphireRapids.td b/llvm/lib/Target/X86/X86SchedSapphireRapids.td
index 6e292da4e293db..b0ebe70c31fd44 100644
--- a/llvm/lib/Target/X86/X86SchedSapphireRapids.td
+++ b/llvm/lib/Target/X86/X86SchedSapphireRapids.td
@@ -59,6 +59,8 @@ def SPRPort01_05          : ProcResGroup<[SPRPort01, SPRPort05]>;
 def SPRPort01_05_10       : ProcResGroup<[SPRPort01, SPRPort05, SPRPort10]>;
 def SPRPort02_03          : ProcResGroup<[SPRPort02, SPRPort03]>;
 def SPRPort02_03_11       : ProcResGroup<[SPRPort02, SPRPort03, SPRPort11]>;
+def SPRPort02_03_10       : ProcResGroup<[SPRPort02, SPRPort03, SPRPort10]>;
+def SPRPort05_11          : ProcResGroup<[SPRPort05, SPRPort11]>;
 def SPRPort07_08          : ProcResGroup<[SPRPort07, SPRPort08]>;
 
 // EU has 112 reservation stations.
@@ -78,6 +80,10 @@ def SPRPort02_03_07_08_11 : ProcResGroup<[SPRPort02, SPRPort03, SPRPort07,
   let BufferSize = 72;
 }
 
+def SPRPortAny : ProcResGroup<[SPRPort00, SPRPort01, SPRPort02, SPRPort03,
+                               SPRPort04, SPRPort05, SPRPort06, SPRPort07,
+                               SPRPort08, SPRPort09, SPRPort10, SPRPort11]>;
+
 // Integer loads are 5 cycles, so ReadAfterLd registers needn't be available
 // until 5 cycles after the memory operand.
 def : ReadAdvance<ReadAfterLd, 5>;

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

def SPRPort02_03 : ProcResGroup<[SPRPort02, SPRPort03]>;
def SPRPort02_03_11 : ProcResGroup<[SPRPort02, SPRPort03, SPRPort11]>;
def SPRPort02_03_10 : ProcResGroup<[SPRPort02, SPRPort03, SPRPort10]>;
def SPRPort05_11 : ProcResGroup<[SPRPort05, SPRPort11]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, 2,3,11 is a group according to intel optimization manual. https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
uops.info and some website and even some intel tool made mistake to swap them.
I fixed this in https://reviews.llvm.org/D130897. llvm/utils/schedtool/tools/add_uops_uopsinfo.py:29 print('Warning: port 10 and port 11 are reversed on uops.info.', "Let's swap them.", file=sys.stderr)

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ports should be based on optimization manual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this isn't a typo in the optimization manual there? All of the other tools being different makes it seem like something might be up here.

I just took the names of the performance counters from libpfm, so it would probably need to be adjusted there too first.

@boomanaiden154
Copy link
Contributor Author

Going to merge this as I need this to do cycles measurements on Sapphire Rapids. Happy to continue discussing the optimization manual discrepancy more post-commit.

If it's actually an issue, the alder lake mappings need to be updated too, so probably better as a follow up PR assuming there's an issue.

@boomanaiden154 boomanaiden154 merged commit eb53d08 into llvm:main Oct 28, 2024
10 checks passed
@boomanaiden154 boomanaiden154 deleted the sapphire-rapids-pfm-counters branch October 28, 2024 16:07
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 28, 2024

@boomanaiden154 Please can you raise an issue(s) for the port namings

@boomanaiden154
Copy link
Contributor Author

Good point. I've filed #113941.

NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This patch adds the appropriate hookups in X86PfmCounters.td for
SapphireRapids. This is mostly to fix errors when some of my jobs that
only really need dummy counters get scheduled on sapphire rapids
machines, but figured I might as well do it properly while here. I do
not have hardware access to test this currently, but this matches
exactly with what is in the libpfm source code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants