-
Notifications
You must be signed in to change notification settings - Fork 2.4k
add RVV optimization for ZSTD_row_getMatchMask #4481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: gong-flying <[email protected]>
|
Hi @camel-cdr @Cyan4973 , |
|
Look, I'm not going to help you finetune your prompts until the llm generates a good implementation. I've already toled you how to implement it earlier and probably wasted more time on this now then it would've taken me to write the implementation my self. |
lib/compress/zstd_lazy.c
Outdated
| int i; | ||
| assert(nbChunks == 1 || nbChunks == 2 || nbChunks == 4); | ||
|
|
||
| size_t vl = __riscv_vsetvl_e8m1(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
this is not C90 compliant.
The size_t vl declaration should come before the assert() expression.
|
Is this new code path tested in CI ? |
Performance (vs. SWAR) - 16-byte data: 5.87x speedup - 32-byte data: 9.63x speedup - 64-byte data: 17.98x speedup Co-authored-by: gong-flying <[email protected]>
|
yes, this looks proper now 👍 |
I've confirmed that the new RVV code path is covered |
This pull request introduces a RISC-V Vector (RVV) specific optimization for the
ZSTD_row_getMatchMaskfunction, replacing the generic SWAR implementation on RV64 platforms with V-extension support. The goal is to leverage RVV's parallel computation capabilities to improve performance on the RISC-V architecture.Performance
Microbenchmark Results
A microbenchmark isolating the
ZSTD_row_getMatchMaskfunction shows a significant speedup compared to the SWAR fallback.rowEntriesFullbench
The overall impact on the
fullbenchis modest. However, the new implementation shows a consistent small improvement and, most importantly, no performance regression.Validation
make check).make test).make staticAnalyze).