Skip to content

Conversation

lvshuo2016
Copy link
Contributor

Add prefetch and vsetvli support, on some RV real hardware platforms, its performance can be improved by 100%.

configure.ac Outdated
if test "x$rvv" = "xyes"; then
CFLAGS+=" -march=rv64gcv"
CCASFLAGS+=" -march=rv64gcv"
CFLAGS+=" -march=rv64gcv_zicbop"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is going to be in conflict with CFLAGS+=" -march=rv64gcv_zbc_zvbc_zvbb" in https://github.com/intel/isa-l/pull/350/files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't conflict because they are on different branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but what I meant is that the flags are different. Which one will stay? Or will it be a combination of both? I don't quite understand RISC-V flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do a simple compiler test, -march=rv64gcv_zicbop_zbc_zvbc_zvbb,the flags combine both. It compiler successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should include detection of zicbop at both compile time and runtime, rather than adding it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@pablodelara
Copy link
Contributor

Can you sign off the commit? Thanks!

@lvshuo2016
Copy link
Contributor Author

Can you sign off the commit? Thanks!

Updated. Thanks.

@sunyuechi
Copy link
Contributor

Similarly, https://github.com/ChristopherHX/github-act-runner/ encountered a panic and is currently under investigation...

@lvshuo2016
Copy link
Contributor Author

do you have comments? If have i can fix it.

configure.ac Outdated
if test "x$rvv" = "xyes"; then
CFLAGS+=" -march=rv64gcv"
CCASFLAGS+=" -march=rv64gcv"
CFLAGS+=" -march=rv64gcv_zicbop"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should include detection of zicbop at both compile time and runtime, rather than adding it directly.


vsetvli a5, x0, e8, m1 /* Set vector length to maximum */

vsetvli a5, x0, e8, m1,ta,ma /* Set vector length to maximum */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the commits — the addition of ta and ma for vsetvli should be in a separate commit.

Copy link
Contributor Author

@lvshuo2016 lvshuo2016 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do ta and ma for vsetvli need to be committed separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch part and filling in the default ta and ma for vset are two unrelated matters, and they should generally not be combined in a single commit, as doing so makes the commit appear confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vsetvli is just a one-line modification, and the function is very clear. Is it necessary to make a separate commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of splitting is to make things simpler — as long as the functionalities are separate, this applies whether it’s a single line or a single word.
Such split commits are actually a necessary requirement in most open-source projects. However, for isa-l specifically, it might be better to let pablodelara decide. I don’t want to say it must be done in a certain way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @sunyuechi, even if it is one line. And especially if it is only one line, it should be easy to split into two commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

configure.ac Outdated
".insn i 0x0F, 0, x0, x0, 0x010" ::: "memory"
);
])],
[AC_DEFINE([HAVE_ZICBOP], [1], [Enable Zicbop instructions])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the differences in code path if this instruction is available or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should add macro HAVE_ZICBOP judgment conditions in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@sunyuechi
Copy link
Contributor

Add prefetch and vsetvli support, on some RV real hardware platforms, its performance can be improved by 100%.

Specifically, which hardware is it?

@sunyuechi
Copy link
Contributor

The commits need to be reorganized;

roughly three commits are needed: prefetch, vsetvli default value,
and implementation.

For example, this change and other unrelated logic changes should be moved out of the prefetch commit:

- slli t6, x_vec, 5
+ slli t6, x_vec, 2

The HAVE_ZICBOP macro should be merged into the prefetch commit.

@lvshuo2016 lvshuo2016 force-pushed the riscv-optimize branch 3 times, most recently from 87ab99c to be89c37 Compare August 28, 2025 08:38
@lvshuo2016
Copy link
Contributor Author

The commits need to be reorganized;

roughly three commits are needed: prefetch, vsetvli default value, and implementation.

For example, this change and other unrelated logic changes should be moved out of the prefetch commit:

- slli t6, x_vec, 5
+ slli t6, x_vec, 2

The HAVE_ZICBOP macro should be merged into the prefetch commit.

Updated.

@lvshuo2016
Copy link
Contributor Author

Could this PR be merged?

configure.ac Outdated
[AC_DEFINE([HAVE_ZICBOP], [0], [Disable Zicbop instructions])
AM_CONDITIONAL([HAVE_ZICBOP], [false]) zicbop=no]
)
AC_MSG_RESULT([$zicbop])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should belong to the third commit, but it's currently in the first commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently in third commit. not in first commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently in third commit. not in first commit.

+ AC_MSG_RESULT([$zicbop]) 

is in erasure_code: add optimization implementation, but I think it should be in erasure_code: add prefetch optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uploading aa.png…
It is currently in erasure_code: add prefetch optimization.

commit eac4d0f (HEAD -> riscv-optimize, origin/riscv-optimize)
Author: lvshuo [email protected]
Date: Thu Aug 28 16:48:04 2025 +0800

erasure_code: add prefetch optimaztion

Signed-off-by: Shuo Lv <[email protected]>

diff --git a/configure.ac b/configure.ac
index 1a1476a..e89d660 100644
--- a/configure.ac
+++ b/configure.ac
@@ -38,6 +38,7 @@ AM_CONDITIONAL([CPU_PPC64LE], [test "$CPU" = "ppc64le"])
AM_CONDITIONAL([CPU_RISCV64], [test "$CPU" = "riscv64"])
AM_CONDITIONAL([CPU_UNDEFINED], [test "x$CPU" = "x"])
AM_CONDITIONAL([HAVE_RVV], [false])
+AM_CONDITIONAL([HAVE_ZICBOP], [false])

Check for programs

AC_PROG_CC_STDC
@@ -70,10 +71,28 @@ case "${CPU}" in
[AC_DEFINE([HAVE_RVV], [0], [Disable RVV instructions])
AM_CONDITIONAL([HAVE_RVV], [false]) rvv=no]
)

  •           AC_MSG_CHECKING([Zicbop support])
    
  •           AC_COMPILE_IFELSE(
    
  •                   [AC_LANG_PROGRAM([], [
    
  •                           __asm__ volatile(
    
  •                                   ".option arch, +zicbop\n"
    
  •                                   ".insn i 0x0F, 0, x0, x0, 0x010" ::: "memory"
    
  •                           );
    
  •                   ])],
    
  •                   [AC_DEFINE([HAVE_ZICBOP], [1], [Enable Zicbop instructions])
    
  •                   AM_CONDITIONAL([HAVE_ZICBOP], [true]) zicbop=yes],
    
  •                   [AC_DEFINE([HAVE_ZICBOP], [0], [Disable Zicbop instructions])
    
  •                   AM_CONDITIONAL([HAVE_ZICBOP], [false]) zicbop=no]
    
  •           )
              AC_MSG_RESULT([$zicbop])
    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that the picture could not display here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used

gh pr checkout 351
git show bbbbf6c

and saw that it’s still in this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a problem, my local rebase has a conflict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update.


vsetvli a5, x0, e8, m1 /* Set vector length to maximum */

vsetvli a5, x0, e8, m1,ta,ma /* Set vector length to maximum */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added ta and ma in vset should use the same spacing as in the first half of the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

slli t_offset, x_vec, 5
slli x_vec, x_vec, 3

slli t_offset, x_vec, 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message should include the reason for changing the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@lvshuo2016 lvshuo2016 force-pushed the riscv-optimize branch 2 times, most recently from 6a8fa6c to eac4d0f Compare September 4, 2025 08:27
@lvshuo2016
Copy link
Contributor Author

All is update.

@sunyuechi
Copy link
Contributor

Add prefetch and vsetvli support, on some RV real hardware platforms, its performance can be improved by 100%.

Specifically, which hardware is it?

internal self-developed chip, but it has no effect on the current development board which is the sequentially launched core

Please try to explain the situation and/or describe the test environment in the commit message.

@lvshuo2016 lvshuo2016 force-pushed the riscv-optimize branch 3 times, most recently from 2f626d5 to 99d5360 Compare September 5, 2025 06:58
@lvshuo2016
Copy link
Contributor Author

All is updated.

@lvshuo2016
Copy link
Contributor Author

Could this PR be merged?

@pablodelara
Copy link
Contributor

@sunyuechi what do you think of the latest code?

@pablodelara
Copy link
Contributor

pablodelara commented Sep 10, 2025

@lvshuo2016 one more comment. Could you review your commit messages? There are typos and also the "Signed-off-by" line should be at the end of message always.

@pablodelara pablodelara reopened this Sep 10, 2025

vsetvli t0, x0, e8, m1 /* Set vector length to maximum */

# vsetvli t0, x0, e8, m1 /* Set vector length to maximum */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@sunyuechi
Copy link
Contributor

Add prefetch and vsetvli support, on some RV real hardware platforms, its performance can be improved by 100%.

Specifically, which hardware is it?

internal self-developed chip, but it has no effect on the current development board which is the sequentially launched core

Please try to explain the situation and/or describe the test environment in the commit message.

Please add the relevant description. Also, can we change the vsetvli commit subject to indicate adding default values instead? Since there’s no performance optimization involved.

@sunyuechi
Copy link
Contributor

In "erasure_code: add prefetch optimization", a compile-time check was added, but a runtime check still needs to be implemented.

reduce one slli instructions and remove the dependence between vle8.v and ld instructions

Signed-off-by: Shuo Lv <[email protected]>
@lvshuo2016 lvshuo2016 force-pushed the riscv-optimize branch 2 times, most recently from e96d047 to aee9736 Compare September 11, 2025 08:07
@lvshuo2016
Copy link
Contributor Author

@lvshuo2016 one more comment. Could you review your commit messages? There are typos and also the "Signed-off-by" line should be at the end of message always.

Updated.

add vsetvli parameter to default values

Signed-off-by: Shuo Lv <[email protected]>
@lvshuo2016
Copy link
Contributor Author

Add prefetch and vsetvli support, on some RV real hardware platforms, its performance can be improved by 100%.

Please try to explain the situation and/or describe the test environment in the commit message.

Please add the relevant description. Also, can we change the vsetvli commit subject to indicate adding default values instead? Since there’s no performance optimization involved.

Updated.

@lvshuo2016
Copy link
Contributor Author

runtime check

runtime check? Do you means getauxval check?

@sunyuechi
Copy link
Contributor

getauxval or hwprobe, currently ec_riscv64_dispatcher.c only checks for v.

@lvshuo2016
Copy link
Contributor Author

lvshuo2016 commented Sep 14, 2025

prefetch

getauxval or hwprobe, currently ec_riscv64_dispatcher.c only checks for v.

We can use code like the following to check.
#ifndef DETECT_RISCV64_HWCAP_ISA_ZICACHEOP
#define DETECT_RISCV64_HWCAP_ISA_ZICACHEOP (1UL << 28)
#endif

int supports_riscv_prefetch() {
unsigned long hwcap = getauxval(AT_HWCAP);
if (hwcap & DETECT_RISCV64_HWCAP_ISA_ZICACHEOP) {
return 1; // support
}
return 0; // not support
}
in .S file,
call supports_riscv_prefetch to determine whether to execute prefetch. if It support then call prefetch, or else don't call.

Then call this C function in assembly. Of course, the check code can also be implemented in assembly, but there are too many instructions.
Performing runtime checks on each prefetch instruction will incur a lot of overhead. Prefetch is an extension instruction, but it is a basic function in most hardware, and there is not much need to add runtime checks for each Prefetch instruction separately.
If only exit simply by judging that prefetch is not supported, the result will be the same as if prefetch cannot be executed. What do you think?

@sunyuechi
Copy link
Contributor

Then call this C function in assembly. Of course, the check code can also be implemented in assembly, but there are too many instructions. Performing runtime checks on each prefetch instruction will incur a lot of overhead. Prefetch is an extension instruction, but it is a basic function in most hardware, and there is not much need to add runtime checks for each Prefetch instruction separately. If only exit simply by judging that prefetch is not supported, the result will be the same as if prefetch cannot be executed. What do you think?

According to riscv64_multibinary.h, the selection of the corresponding function is only detected at the first run, not at every instruction execution, so the overhead should be minimal.

For the hwprobe implementation, you can refer to #350. From the email addresses, it looks like you’re from the same parent company, so perhaps you could increase collaboration.

@lvshuo2016
Copy link
Contributor Author

Then call this C function in assembly. Of course, the check code can also be implemented in assembly, but there are too many instructions. Performing runtime checks on each prefetch instruction will incur a lot of overhead. Prefetch is an extension instruction, but it is a basic function in most hardware, and there is not much need to add runtime checks for each Prefetch instruction separately. If only exit simply by judging that prefetch is not supported, the result will be the same as if prefetch cannot be executed. What do you think?

According to riscv64_multibinary.h, the selection of the corresponding function is only detected at the first run, not at every instruction execution, so the overhead should be minimal.

For the hwprobe implementation, you can refer to #350. From the email addresses, it looks like you’re from the same parent company, so perhaps you could increase collaboration.

hwprobe don't support runtime check for Zicacheop(prefetch) and there is no related macro in kernel head file which confirmed with 350 author yinlenree.

@sunyuechi
Copy link
Contributor

sunyuechi commented Sep 15, 2025

Yes, at the moment there is only a patch series to add hwprobe support for Zicbop
(e.g. https://lkml.org/lkml/2025/9/11/849), but it has not yet been merged upstream.

If there is no clear benefit shown on publicly available hardware yet, it might be more
appropriate to wait until the kernel side has merged the corresponding hwprobe support
before moving forward with this optimization in ISA-L.

Or first add hwprobe support based on the related patch?

@lvshuo2016
Copy link
Contributor Author

Or first add hwprobe support based on the related patch?

Yes, at the moment there is only a patch series to add hwprobe support for Zicbop (e.g. https://lkml.org/lkml/2025/9/11/849), but it has not yet been merged upstream.

If there is no clear benefit shown on publicly available hardware yet, it might be more appropriate to wait until the kernel side has merged the corresponding hwprobe support before moving forward with this optimization in ISA-L.

Or first add hwprobe support based on the related patch?

It may not compile successfully in current kernel if add hwprobe support based on the related patch. So I will remove the prefetch patch first.

@lvshuo2016
Copy link
Contributor Author

lvshuo2016 commented Sep 17, 2025

Zicacheop

There are two ways. One is to add prefetch, which has no side effects and has added compile-time detection. The lack of dynamic detection here is not a very important issue. Later, when the kernel supports it, a patch can be also added. The other is to remove the prefetch patch submission. Which one do you think is suitable?

@lvshuo2016
Copy link
Contributor Author

@pablodelara hi, Pablo, what do you think? should we merge it or remove the prefetch patch ?

@sunyuechi
Copy link
Contributor

sunyuechi commented Sep 17, 2025

I think it’s reasonable to either wait until hwprobe is merged before adding prefetch, or to first add prefetch behind a compile-time option (default off) as an experimental feature.

If it’s to be merged now, there are some adjustments needed:

  1. Currently there are cases where the compiler supports it but the CPU does not, so it’s necessary to change the build to ensure prefetch is also disabled by default even if the compiler supports it.
  2. Since there is no publicly available hardware for validation, the commit message for prefetch should include an explanation (how it was tested, what effect it has, and why prefetch is being added, Including an explanation of the current situation of this extension detection.).

@pablodelara
Copy link
Contributor

I think it’s reasonable to either wait until hwprobe is merged before adding prefetch, or to first add prefetch behind a compile-time option (default off) as an experimental feature.

If it’s to be merged now, there are some adjustments needed:

  1. Currently there are cases where the compiler supports it but the CPU does not, so it’s necessary to change the build to ensure prefetch is also disabled by default even if the compiler supports it.
  2. Since there is no publicly available hardware for validation, the commit message for prefetch should include an explanation (how it was tested, what effect it has, and why prefetch is being added, Including an explanation of the current situation of this extension detection.).

When you say wait until hwprobe is merged, do you mean https://lkml.org/lkml/2025/9/11/849?
Being this an external dependency, it's going to make this tough...

@sunyuechi
Copy link
Contributor

@pablodelara Yes, I’m okay with either merging quickly or waiting until hwprobe is merged — it mainly depends on how the author decides to adjust the patch.

@pablodelara
Copy link
Contributor

@pablodelara Yes, I’m okay with either merging quickly or waiting until hwprobe is merged — it mainly depends on how the author decides to adjust the patch.

Up to you both. I'm OK merging it if you are OK with it, with the changes required.

@lvshuo2016
Copy link
Contributor Author

lvshuo2016 commented Sep 19, 2025

I think it’s reasonable to either wait until hwprobe is merged before adding prefetch, or to first add prefetch behind a compile-time option (default off) as an experimental feature.
If it’s to be merged now, there are some adjustments needed:

  1. Currently there are cases where the compiler supports it but the CPU does not, so it’s necessary to change the build to ensure prefetch is also disabled by default even if the compiler supports it.
  2. Since there is no publicly available hardware for validation, the commit message for prefetch should include an explanation (how it was tested, what effect it has, and why prefetch is being added, Including an explanation of the current situation of this extension detection.).

When you say wait until hwprobe is merged, do you mean https://lkml.org/lkml/2025/9/11/849? Being this an external dependency, it's going to make this tough...

I remove the prefetch patch,the current code could be merge. I will create a new pull request to track prefetch(include runtime check).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants