Skip to content

compress: vectorize ZSTD_count() with SSE2#4612

Open
rmilkowski wants to merge 2 commits intofacebook:devfrom
rmilkowski:upstream/sse2-count-2
Open

compress: vectorize ZSTD_count() with SSE2#4612
rmilkowski wants to merge 2 commits intofacebook:devfrom
rmilkowski:upstream/sse2-count-2

Conversation

@rmilkowski
Copy link

Summary

Add an SSE2 fast path to ZSTD_count() on x86/x86_64.

After the initial machine-word compare, the new path compares 16 bytes at a time with _mm_cmpeq_epi8() / _mm_movemask_epi8() and uses ZSTD_countTrailingZeros32() to jump to the first mismatch. Non-x86 targets and builds without SSE2 keep the existing scalar path.

Why

ZSTD_count() is a hot helper in match finding. When matches extend past the first word, counting them 16 bytes at a time reduces scalar work on match-heavy inputs.

Benchmark

Host: Intel Xeon Gold 6254, Linux x86_64

Method:

  • levels 1..22
  • 3 runs per file / level
  • page cache warmed before each run

Overall:

  • mean compression delta: +16.1%
  • median compression delta: +8.3%
  • suite CPU time: -1.8%

Largest gains:

  • pattern_64kb_gap_0kb_128mb.bin: +98.3% at level 22
  • pattern_64kb_gap_0kb_128mb.bin: +86.0% at level 10
  • pattern_64kb_gap_0kb_128mb.bin: +75.3% at level 9
  • runlength_64kb_pad64_32mb.bin: +54.7% at level 10

Largest regressions observed:

  • pattern_1kb_gap_1023kb_128mb.bin: -6.7% at level 8
  • high_entropy_190sym_64mb.bin: -4.6% at level 5
  • pattern_1kb_gap_1023kb_128mb.bin: -2.7% at level 1

Overall the tradeoff looks favorable: large wins on repetitive / match-heavy inputs, with limited downside on sparse or high-entropy inputs.

Testing

  • playTests.sh
  • CLI tests
  • test-fullbench
  • test-fuzzer
  • test-zstream
  • test-invalidDictionaries
  • test-legacy
  • test-decodecorpus
  • test-pool
  • test-longmatch
  • CFLAGS='-Werror -O2' make -j1 lib zstd
  • make c89build
  • make gnu90build
  • make c99build
  • make cxxtest
  • valgrind checks
  • CMake configure / build

Notes

  • No API change.
  • No format change.

@meta-cla meta-cla bot added the CLA Signed label Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant