-
Notifications
You must be signed in to change notification settings - Fork 332
crc: Add vector-accelerated assembly code for RISC-V 64-bit architecture with Zvbc instruction set #350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cd1a52d
to
3259ef2
Compare
Hi @yinlenree. I suggest you include commit title like |
I'll review it carefully next week. For now, I’ve noticed two issues: If the compiler doesn't support those -march options, the newly added CRC files will still be compiled, leading to build failures. Some files have trailing blank lines at the end,please remove all of them. Some files have mixed indentation using both spaces and tabs. Please make them consistent. (You can see the differences using git diff.) |
Understood, thank you both for the suggestions. I'll first address the compilation-related issues. This might take some time since I haven't worked on this aspect before. |
adef254
to
2379b73
Compare
configure.ac
Outdated
__asm__ volatile( | ||
".option arch, +zbc\n" | ||
"clmul zero, zero, zero\n" | ||
"clmulh zero, zero, zero\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use either tabs or spaces consistently for indentation, and check other files as well.
include/riscv64_multibinary.h
Outdated
#include <sys/auxv.h> | ||
#include <asm/hwprobe.h> // 包含 RISC-V 硬件探测相关的宏定义(如 RISCV_HWPROBE_EXT_ZBB) | ||
#include <unistd.h> // 提供系统调用相关声明 | ||
#include <sys/syscall.h> // 定义 __NR_riscv_hwprobe 系统调用号 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use English comments, and check other files as well.
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
DEFINE_INTERFACE_DISPATCHER(crc32_gzip_refl) | ||
{ | ||
#if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC && HAVE_ZVBB | ||
struct riscv_hwprobe _probe = INIT_PROBE_STRUCT(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these lines (from 131 to 135) are always the same throughout this file. Would be better to move it to a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a function to determine experimental extensions, while the logic for standard extensions remains unchanged. Do you think this is acceptable?
DEFINE_INTERFACE_DISPATCHER(crc16_t10dif)
{
#if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC && HAVE_ZVBB
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZVBB", "ZBC")) {
return crc16_t10dif_vclmul;
}
#endif
return crc16_t10dif_base;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks tidier, yes
Understood. I will revise my code and comments according to the requirements. Thank you both for the review. |
2379b73
to
740d8a4
Compare
include/riscv64_multibinary.h
Outdated
#define EXT_CODE(ext) ( \ | ||
strcmp(ext, "ZBC") == 0 ? 7 : \ | ||
strcmp(ext, "ZVBB") == 0 ? 17 : \ | ||
strcmp(ext, "ZVBC") == 0 ? 18 : -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use macros instead of these numbers, for example: RISCV_HWPROBE_EXT_ZBC, RISCV_HWPROBE_EXT_ZVBB, ...
static inline int check_riscv_extensions(const char **extensions, size_t count) | ||
{ | ||
struct riscv_hwprobe _probe = INIT_PROBE_STRUCT(); | ||
syscall(__NR_riscv_hwprobe, &_probe, 1, 0, NULL, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that using hwprobe requires checking the kernel version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will replace the numbers with macros. Additionally, regarding the extension check, I plan to use version 6.8 as the cutoff—versions below this will not utilize the ZVBC vector acceleration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that the linux/version.h file may change after a kernel upgrade, making it impossible to determine whether the current kernel supports RISC-V extension macros using the LINUX_VERSION_CODE macro, I plan to directly check whether these extension macros are defined to decide the code distribution. The code is as follows.
#if defined(RISCV_HWPROBE_EXT_ZBC) && defined(RISCV_HWPROBE_EXT_ZVBB) && defined(RISCV_HWPROBE_EXT_ZVBC)
#define EXT_CODE(ext) ( \
strcmp(ext, "ZBC") == 0 ? RISCV_HWPROBE_EXT_ZBC : \
strcmp(ext, "ZVBB") == 0 ? RISCV_HWPROBE_EXT_ZVBB : \
strcmp(ext, "ZVBC") == 0 ? RISCV_HWPROBE_EXT_ZVBC : \
-1)
#endif
...
static inline int check_riscv_extensions(const char **extensions, size_t count)
{
#ifdef EXT_CODE
struct riscv_hwprobe _probe = INIT_PROBE_STRUCT();
syscall(__NR_riscv_hwprobe, &_probe, 1, 0, NULL, 0);
for (size_t i = 0; i < count; i++) {
if (!(_probe.value & EXT_CODE(extensions[i]))) {
return 0;
}
}
return 1;
#else
return 0;
#endif
}
Do you think this is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the detection works by checking whether asm/hwprobe.h exists in the kernel, since these files didn’t exist before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that in the kernel code from versions 6.4 to 6.7, although there is an hwprobe.h file in the include/asm directory, it lacks macros for offsets of extensions like ZVBC, ZBC, etc. These macros were only defined after version 6.8.
Additionally, I noticed that the offset for the V standard extension was defined in version 6.5. Does this mean I also need to check the definition of the V extension offset?
I'm unsure whether there was an interface for detecting extensions like ZVBC in kernels from versions 6.4 to 6.7. I have little experience in this area. Do you have any suggestions?
Here's the URL for the hwprobe.h file in Linux kernel 6.7 from the official repository:
https://github.com/torvalds/linux/blob/v6.7/arch/riscv/include/asm/hwprobe.h
The macro definitions are located here.
https://github.com/torvalds/linux/blob/v6.7/arch/riscv/include/uapi/asm/hwprobe.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can directly detect this macro, similar to how DPDK introduced hwprobe, with a minimum requirement of kernel 6.8?
https://inbox.dpdk.org/dev/[email protected]/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the bitmasks for ZVBC and ZBC were defined starting from version 6.8. I plan to add detection macros for the hwprobe.h file during the compilation phase, as well as detection macros for the bitmasks of these three extensions: ZVBC, ZVBB, and ZBC.
fe7ba9c
to
0d79c34
Compare
Please test what happens if the kernel does not have hwprobe support, for example, by changing |
0d79c34
to
ddd6706
Compare
@sunyuechi I forgot to add the conditional check before including the asm/hwprobe.h header file. I've now fixed it and optimized the code, replacing time-consuming instructions and removing the Zvbb extension. |
Could you review this PR again, @sunyuechi? I am thinking of first merging the "prefetch" PR and then this PR (after they are reviewed, of course). |
@pablodelara Okay, I'll check it tomorrow. |
crc/riscv64/crc_common_vclmul.h
Outdated
#define vec_15 v15 | ||
#define vec_16 v16 | ||
#define vec_17 v17 | ||
#define vec_18 v18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define tmp_0 t0
#define tmp_1 t1
#define tmp_2 t2
#define tmp_3 t3
#define tmp_4 t4
#define tmp_5 t5
#define vec_0 v0
#define vec_1 v1
...
#define vec_10 v10
#define vec_11 v11
#define vec_12 v12
Please remove these #define statements that do not improve readability, these t and v registers are already sufficiently clear on their own.
crc/riscv64/crc_common_vclmul.h
Outdated
#define len a2 | ||
|
||
// return | ||
#define crc_ret a0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, understood. I will remove these redundant and unused #define statements.
ddd6706
to
a2ee08c
Compare
vxor.vv v0, v8, v3 | ||
|
||
addi sp, sp, -16 | ||
vse64.v v0, (sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t5 and t6 are unused and can be utilized to eliminate stack operations.
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
vslideup.vi v8, v9, 1 | ||
vxor.vv v0, v8, v3 | ||
|
||
addi sp, sp, -16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t5 and t6 are unused and can be utilized to eliminate stack operations.
addi t4, t4, 16 | ||
crc_fold_512b_to_128b | ||
|
||
addi sp, sp, -16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use t6 to reduce stack operations (and also check other files as well).
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
ret | ||
|
||
.crc_fold: | ||
# Initialize vector registers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be describing the purpose of vset, but it feels a bit unclear. Could we adjust it slightly?
ret | ||
|
||
.crc_fold: | ||
# Initialize vector registers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be describing the purpose of vset, but it feels a bit unclear. Could we adjust it slightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will replace stack pointer (sp) operations with idle registers in all files and improve the readability of comments in the code.
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
{ | ||
#if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC | ||
unsigned long auxval = getauxval(AT_HWCAP); | ||
if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZBC")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the RISC-V manual,
The Zvknhb and Zvbc Vector Crypto Extensions — and accordingly the composite extensions Zvkn and Zvks — require a Zve64x base, or application ("V") base Vector Extension.
Since the vector instructions you are using exist in both Zve64x and V, it seems that if Zvbc is detected, there’s no need to additionally check for the V extension. Please also review whether any changes are needed in the compilation part regarding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will try to place the detection of Zvbc before that of the V extension, so as to omit the check for the V extension. Thank you.
a2ee08c
to
ade80d5
Compare
I have retained the stack pointer operations when calling crc32_iscsi_refl_vclmul, which are used to save the call information. |
It seems that the check in the file |
Sorry, I forgot about this. Now I have removed the HAVE_RVV macro from both the dispatcher and the assembly file. |
ade80d5
to
56e655a
Compare
.word 0x4ba80000, 0xc01f0000, 0xd7710000, 0x5cc60000, 0xf9ad0000, 0x721a0000, 0x65740000, 0xeec30000 | ||
.word 0xa4150000, 0x2fa20000, 0x38cc0000, 0xb37b0000, 0x16100000, 0x9da70000, 0x8ac90000, 0x017e0000 | ||
.word 0x1f650000, 0x94d20000, 0x83bc0000, 0x080b0000, 0xad600000, 0x26d70000, 0x31b90000, 0xba0e0000 | ||
.word 0xf0d80000, 0x7b6f0000, 0x6c010000, 0xe7b60000, 0x42dd0000, 0xc96a0000, 0xde040000, 0x55b30000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extract a .h file like crc32/64?
(crc/riscv64/crc16_t10dif_vclmul.S
and crc/riscv64/crc16_t10dif_copy_vclmul.S
)
Since the data is completely identical, many parts of the crc_fold_loop calculation .. are also the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not create a .h header file for crc16_t10dif_copy because its procedural code involves memory copying compared to other algorithm implementations. I only extracted the data for the calculation. Do you think this is acceptable?
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
{ | ||
#if HAVE_ZBC && HAVE_ZVBC | ||
unsigned long auxval = getauxval(AT_HWCAP); | ||
if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZBC")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime v check can also be removed.
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
|
||
.crc_table_loop: | ||
lbu a4, 0(a1) | ||
add a1, a1, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause errors in certain environments. For immediates, please use the standard addi whenever possible (and please check other files as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will check my code.
The CRC module of ISA-L has been accelerated using RISC-V's V, Zbc and Zvbc, instruction sets, implementing data folding and Barrett reduction optimizations. Signed-off-by: Ji Dong <[email protected]>
56e655a
to
cf87b79
Compare
I have implemented vector-accelerated CRC modules (including CRC16, CRC32, and CRC64) using the RISC-V V, Zbc, Zvbc, and Zvbb instruction sets, with full functional verification and performance testing completed.
The implementation primarily leverages the vclmul.v and vclmulh.v (carry-less multiply) instructions for data folding. For big-endian processing, it additionally utilizes vrev8.v, vslideup.vi, and vslidedown.vi instructions for byte-order reversal. The final checksum is computed via Barrett reduction.