-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISC-V] Update SpacemiT X60 vector scheduling model with measured latencies #144564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
… vredxor Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
…, vfrsqrt7 Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
…u, vmsbc Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
|
|
||
| // Pattern of vmacc, vmadd, vmul, vmulh, etc.: e8/e16 = 4/4/5/8, e32 = 5,5,5,8, | ||
| // e64 = 7,8,16,32. We use the worst-case until we can split the SEW. | ||
| // TODO: change WriteVIMulV, etc to be defined with LMULSEWSchedWrites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I kind of agree that we can make multiplication's SchedWrite SEW-dependant
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
Signed-off-by: Mikhail R. Gadelha <[email protected]>
|
|
||
| // Strided and indexed loads and stores: scale with both LMUL and EEW | ||
| foreach eew = [8, 16, 32, 64] in { | ||
| defvar EEWMultiplier = !div(eew, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems backwards from what I expect with larger EEWs being more expensive? I would expect this to scale with the number of elements, and thus have smaller EEWs be more expensive.
|
|
||
| // Segmented loads and stores: base latency multiplied by number of fields | ||
| // TODO: These latencies are estimations and are not confirmed experimentally | ||
| foreach mx = SchedMxList in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't seem right. I'd expect either LD + shuffle costing, or one-per-element costing.
| defm "" : LMULWriteResMX<"WriteVIMinMaxX", [SMX60_VIEU], mx, IsWorstCase>; | ||
| } | ||
|
|
||
| let Latency = Get44816Latency<mx>.c in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these have ReleaseAtCycle, and some don't, but I don't see any obvious pattern? Am I missing something here?
|
I talked w/Mikhail offline. I'm generally happy with the direction of this overall patch, but we need to split off smaller pieces to make them practically reviewable. Mikhail is going to post a much smaller patch, starting with the simple integer instructions. I plan to iterate back and forth with him quickly on the patch series with quick LGTMs on the sub-parts where possible for the "obvious" stuff. We'll slow down once we get to the harder parts, so we can focus attention on the interesting questions. |
This PR adds hardware-measured latencies for all instructions defined in Section 11 of the RVV specification: "Vector Integer Arithmetic Instructions" to the SpacemiT-X60 scheduling model. The code in this PR was extracted from PR #144564, so it's smaller to review. I made a few adjustments here and there, and the code is almost identical; the only change was to add ReleaseAtCycles to all instructions modified in this patch, except for the vmul, vdiv, and vrem ones.
This PR adds hardware-measured latencies for all instructions defined in Section 11 of the RVV specification: "Vector Integer Arithmetic Instructions" to the SpacemiT-X60 scheduling model. The code in this PR was extracted from PR llvm#144564, so it's smaller to review. I made a few adjustments here and there, and the code is almost identical; the only change was to add ReleaseAtCycles to all instructions modified in this patch, except for the vmul, vdiv, and vrem ones.
Updates the SpacemiT X60 scheduling model with actual latencies measured on BPi-F3 hardware.
tl;dr: execution time is neutral on SPEC after this patch. There is a code size regression described in issue #146407
Changes:
Completed:
Missing:
Performance Impact:
Known Issues:
Planned follow-up PRs: