Skip to content

Conversation

@w1m024
Copy link
Contributor

@w1m024 w1m024 commented Sep 9, 2025

This pull request introduces a RISC-V Vector (RVV) specific optimization for the ZSTD_row_getMatchMask function, replacing the generic SWAR implementation on RV64 platforms with V-extension support. The goal is to leverage RVV's parallel computation capabilities to improve performance on the RISC-V architecture.

Performance

Microbenchmark Results

A microbenchmark isolating the ZSTD_row_getMatchMask function shows a significant speedup compared to the SWAR fallback.

rowEntries Speedup
16 bytes 5.87x
32 bytes 9.63x
64 bytes 17.98x

Fullbench

The overall impact on the fullbench is modest. However, the new implementation shows a consistent small improvement and, most importantly, no performance regression.

Validation

  • All quick checks passed (make check).
  • All long-running tests passed (make test).
  • Static analysis reports no new issues (make staticAnalyze).

@meta-cla meta-cla bot added the CLA Signed label Sep 9, 2025
@w1m024 w1m024 closed this Sep 9, 2025
@w1m024 w1m024 reopened this Sep 9, 2025
@w1m024
Copy link
Contributor Author

w1m024 commented Sep 9, 2025

Hi @camel-cdr @Cyan4973 ,
Following up on our previous discussion in PR #4478 , I have created this new pull request to address the feedback.
The new implementation resolves the portability issues by refactoring the code, as you suggested.
I would be grateful if you could take a look when you have a moment. Thanks again for your invaluable guidance!

@camel-cdr
Copy link

Look, I'm not going to help you finetune your prompts until the llm generates a good implementation.

I've already toled you how to implement it earlier and probably wasted more time on this now then it would've taken me to write the implementation my self.

int i;
assert(nbChunks == 1 || nbChunks == 2 || nbChunks == 4);

size_t vl = __riscv_vsetvl_e8m1(16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:
this is not C90 compliant.

The size_t vl declaration should come before the assert() expression.

@Cyan4973
Copy link
Contributor

Is this new code path tested in CI ?
I know we have a RISC-V compilation test in CI,
but I am not sure about RVV.

Performance (vs. SWAR)
- 16-byte data: 5.87x speedup
- 32-byte data: 9.63x speedup
- 64-byte data: 17.98x speedup

Co-authored-by: gong-flying <[email protected]>
@camel-cdr
Copy link

yes, this looks proper now 👍

@w1m024
Copy link
Contributor Author

w1m024 commented Sep 12, 2025

Is this new code path tested in CI ? I know we have a RISC-V compilation test in CI, but I am not sure about RVV.

I've confirmed that the new RVV code path is covered
The dev-short-test.yml file uses QEMU to run make check. It sets CFLAGS="-march=rv64gcv" and configures the virtual CPU with -cpu rv64,v=true and vlen values of 128, 256, and 512.

@Cyan4973 Cyan4973 merged commit d190b38 into facebook:dev Sep 16, 2025
104 checks passed
@w1m024 w1m024 deleted the support-rvv-getmask branch December 10, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants