Skip to content

Commit 069f812

Browse files
[X86] Add RCU for Skylake Models (#153832)
We cannot actually retire an infinite number of uops per cycle. This patch adds a RCU to the skylake scheduling model to fix this. I'm purposefully using a loose upper bound here. We're unlikely to actually get four fused uops per cycle, but this is better than not setting anything. Most realistic code I've put through uiCA will retire up to ~6 uops per cycle. Information taken from https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client). This requires modification of the two zero idiom tests because we do not currently model the CPU frontend which would likely be the actual bottleneck in that case. Related to #153747.
1 parent 115f816 commit 069f812

File tree

4 files changed

+365
-353
lines changed

4 files changed

+365
-353
lines changed

llvm/lib/Target/X86/X86SchedSkylakeClient.td

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,12 @@ def SKLPortAny : ProcResGroup<[SKLPort0, SKLPort1, SKLPort2, SKLPort3, SKLPort4,
7070
let BufferSize=60;
7171
}
7272

73+
// Skylake can retire up to four (potentially fused) uops per cycle. Set the
74+
// limit to twice that given we do not model fused uops as only taking up one
75+
// retirement slot. I could not find any documented sources on how many
76+
// in-flight micro-ops can be tracked.
77+
def SKRCU : RetireControlUnit<0, 8>;
78+
7379
// Integer loads are 5 cycles, so ReadAfterLd registers needn't be available until 5
7480
// cycles after the memory operand.
7581
def : ReadAdvance<ReadAfterLd, 5>;

llvm/lib/Target/X86/X86SchedSkylakeServer.td

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,12 @@ def SKXPortAny : ProcResGroup<[SKXPort0, SKXPort1, SKXPort2, SKXPort3, SKXPort4,
7070
let BufferSize=60;
7171
}
7272

73+
// Skylake can retire up to four (potentially fused) uops per cycle. Set the
74+
// limit to twice that given we do not model fused uops as only taking up one
75+
// retirement slot. I could not find any documented sources on how many
76+
// in-flight micro-ops can be tracked.
77+
def SKXRCU : RetireControlUnit<0, 8>;
78+
7379
// Integer loads are 5 cycles, so ReadAfterLd registers needn't be available until 5
7480
// cycles after the memory operand.
7581
def : ReadAdvance<ReadAfterLd, 5>;

0 commit comments

Comments
 (0)