compress: vectorize ZSTD_count() with SSE2 by rmilkowski · Pull Request #4612 · facebook/zstd

rmilkowski · 2026-03-06T17:24:18Z

Summary

Add an SSE2 fast path to ZSTD_count() on x86/x86_64.

After the initial machine-word compare, the new path compares 16 bytes at a time with _mm_cmpeq_epi8() / _mm_movemask_epi8() and uses ZSTD_countTrailingZeros32() to jump to the first mismatch. Non-x86 targets and builds without SSE2 keep the existing scalar path.

Why

ZSTD_count() is a hot helper in match finding. When matches extend past the first word, counting them 16 bytes at a time reduces scalar work on match-heavy inputs.

Benchmark

Host: Intel Xeon Gold 6254, Linux x86_64

Method:

levels 1..22
3 runs per file / level
page cache warmed before each run

Overall:

mean compression delta: +16.1%
median compression delta: +8.3%
suite CPU time: -1.8%

Largest gains:

pattern_64kb_gap_0kb_128mb.bin: +98.3% at level 22
pattern_64kb_gap_0kb_128mb.bin: +86.0% at level 10
pattern_64kb_gap_0kb_128mb.bin: +75.3% at level 9
runlength_64kb_pad64_32mb.bin: +54.7% at level 10

Largest regressions observed:

pattern_1kb_gap_1023kb_128mb.bin: -6.7% at level 8
high_entropy_190sym_64mb.bin: -4.6% at level 5
pattern_1kb_gap_1023kb_128mb.bin: -2.7% at level 1

Overall the tradeoff looks favorable: large wins on repetitive / match-heavy inputs, with limited downside on sparse or high-entropy inputs.

Testing

playTests.sh
CLI tests
test-fullbench
test-fuzzer
test-zstream
test-invalidDictionaries
test-legacy
test-decodecorpus
test-pool
test-longmatch
CFLAGS='-Werror -O2' make -j1 lib zstd
make c89build
make gnu90build
make c99build
make cxxtest
valgrind checks
CMake configure / build

Notes

No API change.
No format change.

compress: vectorize ZSTD_count() with SSE2

2fb0de7

meta-cla bot added the CLA Signed label Mar 6, 2026

compress: fix SSE2 cast-align warnings

5fb1974

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress: vectorize ZSTD_count() with SSE2#4612

compress: vectorize ZSTD_count() with SSE2#4612
rmilkowski wants to merge 2 commits intofacebook:devfrom
rmilkowski:upstream/sse2-count-2

rmilkowski commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rmilkowski commented Mar 6, 2026

Summary

Why

Benchmark

Testing

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant