Skip to content

Conversation

@Leilongjie2000
Copy link

In LLVM 21.1.0, the RegMask of scalar library functions on RISC-V does not include available vector registers. This behavior may increase the likelihood of register spills, especially in functions that frequently invoke scalar library calls within vectorized loops.

For the following C code:

1 void vector_silu(float *input, float *output, int *shape, int num_dims) {
2    int total_elements = 1;
3    for (int i = 0; i < num_dims; ++i) {
4        total_elements *= shape[i];
5    }
6    for (int i = 0; i < total_elements; ++i) {
7        float x = input[i];
8        output[i] = x / (1.0 + expf(-x));
9    }
10 }

the corresponding RISC-V assembly for line 8 (output[i] = x / (1.0 + expf(-x));)is as follows. The reason for the v8 spill that expf (a scalar library function) does not have vector registers available in its RegMask.

        vsetivli        zero, 4, e32, m1, ta, ma
        vfslide1down.vf v8, v8, fa0
        csrr    a0, vlenb
        sh1add  a0, a0, sp
        addi    a0, a0, 32
        vs1r.v  v8, (a0)					# Spill
        csrr    a0, vlenb
        sh2add  a0, a0, a0
        add     a0, a0, sp
        addi    a0, a0, 32
        vl1r.v  v8, (a0)					# Reload
        vslidedown.vi   v8, v8, 3
        vfmv.f.s        fa0, v8
        call    expf

See the online code at https://godbolt.org/z/T7f7PxYor

RISCVRegisterInfo::getCallPreservedMask function implements the setting of the library function's regmask. When we modify it to CSR_ILP32D_LP64D_V_RegMask when the architecture supports the V extension, the regmask setting for the library function returns a V-inclusive regmaskt, spill was successfully eliminated.

We modified and tested only the RISCVABI::ABI_ILP32D and ABI_LP64D cases, but we recommend modifying the remaining cases too. We performed correctness testing on Spec06 and encountered no problems.

The XSCC compiler team developed this implementation.

… Spills

Co-Authored-By: buggfg <[email protected]>
Co-Authored-By: ict-ql <[email protected]>
Co-Authored-By: MissGou <[email protected]>
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@topperc
Copy link
Collaborator

topperc commented Oct 14, 2025

There's no guarantee in the ABI that scalar function doen't use vector registers. If glibc starts allowing vector code in memcpy, memset, it will be very easy for library code to break this.

Have you tried using a vector math library like sleef that contains a vectorized version of expf for RISC-V?

Or have you tried modifying the cost model to not vectorize functions with scalar library calls? Is it profitable to vectorize if you have to keep extracting elements?

@Leilongjie2000
Copy link
Author

Hi, Thanks for your reply! I’d like to clarify that our change fully follows the calling convention.

First, CSR_ILP32D_LP64D_V_RegMask is based on CSR_ILP32D_LP64D_RegMask (scalar callee-saved registers) and only adds a subset of vector callee-saved registers (v1–v7, v24–v31), not all vector registers.

Second, all library functions comply with the ABI design and restore callee-saved registers upon exit — even scalar functions like memset and memcpy that internally use vector instructions.

Therefore, when the subtarget supports the V extension, making vector callee-saved registers available is valid. Additionally, we verified correctness on SPEC CPU2006 and encountered no problems.

Happy to continue the discussion :)

There's no guarantee in the ABI that scalar function doen't use vector registers. If glibc starts allowing vector code in memcpy, memset, it will be very easy for library code to break this.

Have you tried using a vector math library like sleef that contains a vectorized version of expf for RISC-V?

Or have you tried modifying the cost model to not vectorize functions with scalar library calls? Is it profitable to vectorize if you have to keep extracting elements?

@topperc
Copy link
Collaborator

topperc commented Oct 15, 2025

Hi, Thanks for your reply! I’d like to clarify that our change fully follows the calling convention.

First, CSR_ILP32D_LP64D_V_RegMask is based on CSR_ILP32D_LP64D_RegMask (scalar callee-saved registers) and only adds a subset of vector callee-saved registers (v1–v7, v24–v31), not all vector registers.

Second, all library functions comply with the ABI design and restore callee-saved registers upon exit — even scalar functions like memset and memcpy that internally use vector instructions.

I think the patches I've seen for vector memset and memcpy are written in assembly and don't save any save/restore.

Therefore, when the subtarget supports the V extension, making vector callee-saved registers available is valid. Additionally, we verified correctness on SPEC CPU2006 and encountered no problems.

Happy to continue the discussion :)

What if the library code is compiled with gcc? Or an older version of clang/gcc that doesn't have this patch?

@Leilongjie2000
Copy link
Author

According to the RISC-V ELF psABI documentation:

the contents of all callee-saved registers must be restored to what was set on entry,

It must ensure that the entire contents of v1–v7 and v24–v31 are preserved across the call,

Functions that use vector registers to pass arguments and return values must follow this calling convention.

In other words, library functions must strictly follow the calling convention, saving and restoring all callee-saved registers.

This behavior is defined by the ISA, and is independent of any specific compiler or version :)

What if the library code is compiled with gcc? Or an older version of clang/gcc that doesn't have this patch?

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 15, 2025

This behavior is defined by the ISA, and is independent of any specific compiler or version :)

No, it's specified by the ABI, which in the case of RISC-V is https://riscv-non-isa.github.io/riscv-elf-psabi-doc/.

@topperc
Copy link
Collaborator

topperc commented Oct 15, 2025

According to the RISC-V ELF psABI documentation:

the contents of all callee-saved registers must be restored to what was set on entry,
It must ensure that the entire contents of v1–v7 and v24–v31 are preserved across the call,
Functions that use vector registers to pass arguments and return values must follow this calling convention.

In other words, library functions must strictly follow the calling convention, saving and restoring all callee-saved registers.

This behavior is defined by the ISA, and is independent of any specific compiler or version :)

What if the library code is compiled with gcc? Or an older version of clang/gcc that doesn't have this patch?

The rules you are quoting are for the vector register calling convention variant. This is for function that have vector arguments/returns or are declared with the riscv_vector_cc attribute. This does not apply to scalar library functions.

@Leilongjie2000
Copy link
Author

According to the RISC-V ELF psABI documentation:

the contents of all callee-saved registers must be restored to what was set on entry,
It must ensure that the entire contents of v1–v7 and v24–v31 are preserved across the call,
Functions that use vector registers to pass arguments and return values must follow this calling convention.

In other words, library functions must strictly follow the calling convention, saving and restoring all callee-saved registers.
This behavior is defined by the ISA, and is independent of any specific compiler or version :)

What if the library code is compiled with gcc? Or an older version of clang/gcc that doesn't have this patch?

The rules you are quoting are for the vector register calling convention variant. This is for function that have vector arguments/returns or are declared with the riscv_vector_cc attribute. This does not apply to scalar library functions.

I got it. Our insight is that when callees do not violate the Calling Convention Variant, using this variant can significantly reduce register spills, enabling the kernel above to achieve 44% performance improvement on the SpacemiT Key Stone K1.

When statically linked, could BOLT safely remove the spill and reload instructions if the spilled registers in the callees’ assembly code are not modified? Alternatively, could introduce a compiler option to let programmers explicitly choose the Calling Convention Variant when they are confident that the callees do not violate the variant? 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants