[RISC-V] Update SpacemiT X60 vector scheduling model with measured latencies #144564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

mikhailramalho wants to merge 97 commits into llvm:main from mikhailramalho:x60-rvv

Member

mikhailramalho commented Jun 17, 2025 •

edited

Loading

Updates the SpacemiT X60 scheduling model with actual latencies measured on BPi-F3 hardware.

tl;dr: execution time is neutral on SPEC after this patch. There is a code size regression described in issue #146407

Changes:

Added 10 new latency classes
Updated latencies for ~30 instruction categories based on hardware measurements

Completed:

Basic integer ALU, min/max, saturating/averaging arithmetic
Carry operations, mask operations, comparisons
Integer/FP division (split simple/complex based on LMUL)
Widening operations
FP operations including add/sub, mul, FMA
FP conversions (widening/narrowing)
FP reductions including vfredmax/min/usum (fixed fractional LMUL latencies)
FP ordered reductions vfredosum (split simple/complex)
FP widening reductions vfwredosum/vfwredusum (split simple/complex)
Integer reductions
Mask manipulation operations
Permutation operations (gather/compress/slide)
Narrowing shifts and clips (split simple/complex)

Missing:

All vector loads/stores uops are missing their actual latency values. The values in this PR are estimations, while I'm still collecting the real numbers

Performance Impact:

https://lnt.lukelau.me/db_default/v4/nts/674?compare_to=673
This change is mostly NFC
The two benchmarks with improvement/regression on execution time are known to be noisy
There is an increase in code size in two benchmarks: reviewing the code changes, we see a lot more vector load/store instructions

Known Issues:

Code size regression on two SPEC benchmarks
Some operations grouped use worst-case latency
TableGen !cond expressions not working as expected for vector single-width FMA instructions
All compromises I've made are documented as TODO in the code

Planned follow-up PRs:

Debug the code size regressions
Add latencies for vector loads/stores
Work on the TODOs introduced by this PR. It should require splitting some WriteRes groups and changing other scheduling models, but it should be NFC for them

mikhailramalho added 30 commits

June 17, 2025 12:58


          Initial version + test

1b243f3

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added all instructions

2d42b7b

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Let's start with easy one vmv.x.s and vmv.s.x

5e10525

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vrgather.vx

82b0d80

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vrgather.vi, same as vrgather.vx

5c0019a

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Add vcompress

c2efe44

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vmv{1,2,4,8}.r

597f5cb

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vsext/vzext

3eb214c

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vrgather

3b2977b

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Unified vgather and vcompress

9f60d75

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vfmv

d92cedb

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added constant vm* operations

4d1d5f2

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vmpop and vmfirst

b52a1d4

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added viota and vid

26c2c85

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Moved vmv closed together

bd4bde5

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          added vnclip, vnclipu, vnsra, vnsrl

70978c2

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vredmax, vredmaxu, vredmin, vredminu, vredsum, vredand, vredor,…

8c2affd

… vredxor

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vwredsum, vwredsumu and unified the other vreds

cf7cac2

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Update vmerge

671aa11

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vmv

2bb7c08

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vsll, vsra, vsrl

710826f

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vsll, vsra, vsrl

633f483

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vslide instructions

3425cfc

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vfcvt, vfmv, vfmerge, vfclass, vfrec7, vfsgnj, vfsgnjn, vfsgnjx…

c8ed9a6

…, vfrsqrt7

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vsbc, vadc

67fe7c0

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vmadc, vmseq, vmsle, vmsleu, vmsne, vmsgt, vmsgtu, vmslt, vmslt…

e16a2f4

…u, vmsbc

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          vmax, vmaxu, vmin, vminu

016a20e

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vaadd, vaaddu, vasub, vasubu, vsadd, vsaddu, vssub, vssubu

25bf2c2

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vfadd.vf, vfsub.vf, vfmax, vfmin, vfrsub

c4ac239

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added vmfeq, vmfge, vmfle, vmfgt, vmflt, vmfne

429ff5c

Signed-off-by: Mikhail R. Gadelha <[email protected]>

mikhailramalho added 5 commits

June 30, 2025 15:36


          Improve comment

9925ae5

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Fixed latencies that were already split

7b83053

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Clean up test case

59513e6

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Renamed variable

b055340

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Reuse variable

Signed-off-by: Mikhail R. Gadelha <[email protected]>

mikhailramalho mentioned this pull request

[RISC-V] Code Size Increase on SPEC with -mcpu=spacemit-x60 caused by PR 144564 #146407

Open

mikhailramalho requested review from mshockwave and zqb-all

June 30, 2025 19:08

mikhailramalho changed the title ~~[WIP][RISCV] Update SpacemiT X60 vector scheduling model with measured latencies~~ [RISC-V] Update SpacemiT X60 vector scheduling model with measured latencies

mikhailramalho requested a review from topperc

June 30, 2025 19:36

mshockwave reviewed

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

    
                // Pattern of vmacc, vmadd, vmul, vmulh, etc.: e8/e16 = 4/4/5/8, e32 = 5,5,5,8,

                // e64 = 7,8,16,32. We use the worst-case until we can split the SEW.

                // TODO: change WriteVIMulV, etc to be defined with LMULSEWSchedWrites

Member

mshockwave Jul 1, 2025

Personally I kind of agree that we can make multiplication's SchedWrite SEW-dependant

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv.s Outdated Show resolved Hide resolved

mikhailramalho added 6 commits

July 1, 2025 13:32


          Avoid string cast

10f53c7

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Updated WriteVIALUV/X/I to use worst case latency

b999470

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Typo

241fcb0

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Replaced simple cond with if

66801a0

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Renamed variable

012c8e8

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Update vfdiv.vv/.vf and vfrdiv latencies

b54c0b3

Signed-off-by: Mikhail R. Gadelha <[email protected]>

LiqinWeng self-requested a review

July 2, 2025 02:19

mikhailramalho added 6 commits

July 2, 2025 09:03


          Split test case

a2fa503

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Merge branch 'main' into x60-rvv

0cd1708


          Address div comment

d864f02

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Added ReleaseAtCycles for some instructions we think are pipelined

61310dd

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Renamed variables

b9ec16b

Signed-off-by: Mikhail R. Gadelha <[email protected]>


          Whitespace

b9f25ff

Signed-off-by: Mikhail R. Gadelha <[email protected]>

preames reviewed

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

    
                // Strided and indexed loads and stores: scale with both LMUL and EEW

                foreach eew = [8, 16, 32, 64] in {

                  defvar EEWMultiplier = !div(eew, 8);

Collaborator

preames Jul 14, 2025

Hm, this seems backwards from what I expect with larger EEWs being more expensive? I would expect this to scale with the number of elements, and thus have smaller EEWs be more expensive.

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

    
              // Segmented loads and stores: base latency multiplied by number of fields

              // TODO: These latencies are estimations and are not confirmed experimentally

              foreach mx = SchedMxList in {

Collaborator

preames Jul 14, 2025

These don't seem right. I'd expect either LD + shuffle costing, or one-per-element costing.

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

    
                  defm "" : LMULWriteResMX<"WriteVIMinMaxX", [SMX60_VIEU], mx, IsWorstCase>;

                }

                let Latency = Get44816Latency<mx>.c in {

Collaborator

preames Jul 14, 2025

Some of these have ReleaseAtCycle, and some don't, but I don't see any obvious pattern? Am I missing something here?

Collaborator

preames commented Jul 14, 2025

I talked w/Mikhail offline. I'm generally happy with the direction of this overall patch, but we need to split off smaller pieces to make them practically reviewable. Mikhail is going to post a much smaller patch, starting with the simple integer instructions. I plan to iterate back and forth with him quickly on the patch series with quick LGTMs on the sub-parts where possible for the "obvious" stuff. We'll slow down once we get to the harder parts, so we can focus attention on the interesting questions.

mikhailramalho mentioned this pull request

[RISC-V] Update SpacemiT-X60 Vector Integer latencies #149207

Merged

mikhailramalho added a commit that referenced this pull request


          [RISC-V] Update SpacemiT-X60 Vector Integer latencies (#149207)

This PR adds hardware-measured latencies for all instructions defined in
Section 11 of the RVV specification: "Vector Integer Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.

The code in this PR was extracted from PR #144564, so it's smaller to
review. I made a few adjustments here and there, and the code is almost
identical; the only change was to add ReleaseAtCycles to all
instructions modified in this patch, except for the vmul, vdiv, and vrem
ones.

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request


          [RISC-V] Update SpacemiT-X60 Vector Integer latencies (llvm#149207)

9ba7f10

This PR adds hardware-measured latencies for all instructions defined in
Section 11 of the RVV specification: "Vector Integer Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.

The code in this PR was extracted from PR llvm#144564, so it's smaller to
review. I made a few adjustments here and there, and the code is almost
identical; the only change was to add ReleaseAtCycles to all
instructions modified in this patch, except for the vmul, vdiv, and vrem
ones.

mikhailramalho closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

preames preames left review comments

mshockwave mshockwave left review comments

asb Awaiting requested review from asb

lukel97 Awaiting requested review from lukel97

zqb-all Awaiting requested review from zqb-all

topperc Awaiting requested review from topperc

LiqinWeng Awaiting requested review from LiqinWeng

Labels

None yet