[Core] Add xxHash as a high-performance hash option for accelerating prefix caching #29163

LuminolT · 2025-11-21T07:50:58Z

Purpose

This PR introduces xxHash-based hash algorithms to vLLM’s prefix caching, providing a faster and collision-resistant alternative to the SHA-256 variants added in #23673.

Why xxHash? Prefix Caching does not require cryptographic security:

It never stores secret data.
It never leaves process boundaries.
It is used exclusively to identify structural equivalence of prefix blocks.

Therefore, cryptographic collision resistance is unnecessary.

xxHash strikes the right balance: Much better collision resistance compared to Python builtin hash(), which is salted and unstable across processes. Much faster than SHA-256 and SHA-256+CBOR used in current prefix caching.

This change is lightweight: it only introduces the xxhash dependency and adds minimal adapter code to integrate xxHash into the existing hashing interface. We added two small benchmarking scripts to validate correctness and performance, but these can be removed in follow-up cleanup if needed.

⚠️ Only problem is xxHash introduces a third-party dependency. SHA-256 is available via Python standard library hashlib but xxHash requires an additional Python dependency xxhash.

As a result this PR adds xxHash as an optional high-performance hash. Users can opt in with:

--prefix-caching-hash-algo xxhash
--prefix-caching-hash-algo xxhash-cbor

Test Plan

1. Micro benchmark of hash functions

benchmarks/benchmark_hash.py is added for benchmarks for hash functions. This could be removed if PR is merged.

This evaluation is same as #23673, except for some configurations for xxHash.

python benchmarks/benchmark_hash.py --iterations 10000

This compares Python's builtin hash(), sha256 and xxHash.

2. Block-level hash throughput

benchmarks/benchmark_prefix_block_hash.py is added for block-level hash throughput evaluation This could be removed if PR is merged.

This test is model-agnostic, so it reflects performance on larger models as well, rather than only on the small OPT-125M model we evaluated following.

python benchmarks/benchmark_prefix_block_hash.py --num-blocks 20000 --block-size 64
python benchmarks/benchmark_prefix_block_hash.py --num-blocks 50000 --block-size 16

This measures token hashing throughput for: sha256, sha256_cbor, xxhash and xxhash_cbor.

3. Prefix caching evaluation

This is the benchmark currently provided by vLLM. We tested it using different hash algorithms and compared their results.

Following evaluations are conducted on one-card A100 server.

python benchmarks/benchmark_prefix_caching.py \
  --model facebook/opt-125m \
  --enable-prefix-caching \
  --prefix-caching-hash-algo <algo> \
  --input-length-range "128:256" \
  --output-len 128 \
  --num-prompts 200 \
  --repeat-count 5 \
  --seed 42

4. Long-running throughput & GC behavior

Same as #23673. @Jialin 's Jialin#4 is used for GC analysis. Since SHA-256 significantly reduces GC overhead, our goal is to match or further reduce GC time compared to SHA-256.

Input/Output configurations:

input_len=1500, output_len=1
input_len=1500, output_len=150

Each run generates logs:

throughput_sha256_out1_*.log
throughput_sha256_cbor_out1_*.log
throughput_xxhash_out1_*.log
throughput_xxhash_cbor_out1_*.log
...

Test Result

1. Micro benchmark of hash functions

# python benchmarks/benchmark_hash.py --iterations 10000
============================================================
HASH FUNCTION MICRO BENCHMARK
============================================================
Test data: (32-byte bytes object, 32-int tuple)
Iterations: 10,000
============================================================

Results:
  SHA256 (pickle) :     1.45 ±   0.24 μs
  xxHash (pickle) :     0.99 ±   0.16 μs
  built-in hash() :     0.19 ±   0.07 μs

============================================================
SUMMARY (relative to built-in hash())
============================================================
• SHA256 (pickle) is 7.8x slower than built-in hash()
• xxHash (pickle) is 5.3x slower than built-in hash()

👉 Among non-builtin hashers, xxHash is ~1.5× faster than SHA-256.

2. Block-level hash throughput

20k × 64 tokens and 50k × 16 tokens.

# python benchmarks/benchmark_prefix_block_hash.py --num-blocks 20000 --block-size 64
Benchmarking 4 algorithms on 20000 blocks (block size=64).
sha256         avg: 0.044319s  best: 0.043920s  throughput: 29.14M tokens/s
sha256_cbor    avg: 0.195202s  best: 0.194645s  throughput: 6.58M tokens/s
xxhash         avg: 0.035225s  best: 0.035059s  throughput: 36.51M tokens/s
xxhash_cbor    avg: 0.184370s  best: 0.183651s  throughput: 6.97M tokens/s

👉 xxHash improves block hashing throughput by 25–40%.

3. Prefix caching evaluation

Algo	Time (s)
sha256	~4.17
sha256_cbor	~4.25
xxhash	~4.12
xxhash_cbor	~4.23

👉 Performance improves 1–2% with xxHash.

We should note that our experiments were conducted only on the small OPT-125M model. In larger production service, where prefix-hit rates are substantially higher, xxHash is expected to show even greater performance advantages.

4. Long-running throughput & GC behavior

output_len = 1 (prefill-dominant)

sha256: 219.9 req/s
xxhash: 223.4 req/s (+1.6%)

GC:

Both ~300 events
GC runtime ≈ 1.3–1.4% of wall clock
Main GC distribution almost identical
No new long-tail events from xxHash

output_len = 150 (decode-heavy)

sha256: 77.9 req/s
xxhash: 78.9 req/s (+1.2%)

GC:

sha256: total GC ~1.92 s
xxhash: total GC ~1.68 s (−12.8%)
xxHash shows fewer and shorter >400ms GC outliers

CBOR variants

Similar behavior:

xxHash_CBOR slightly faster than SHA256_CBOR
GC profile remains ~1–2% of wall clock
No regressions or long-tail amplification

👉 xxHash does not increase GC pressure and may even slightly reduce tail events.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces xxHash as a high-performance, non-cryptographic hashing option for prefix caching, which is a sensible optimization. The implementation is well-structured and includes comprehensive benchmarks to validate the performance gains. My review focuses on improving the robustness of handling the new optional dependency, xxhash. I've suggested a change to ensure compatibility with older versions of the xxhash library and another to make the test suite resilient to its absence. Overall, this is a solid contribution.

tests/v1/engine/test_engine_args.py

vllm/utils/hashing.py

mergify · 2025-11-21T07:59:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LuminolT.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…pdate tests Signed-off-by: LuminolT <[email protected]>

…d xxHash from [23673](vllm-project#23673) Signed-off-by: LuminolT <[email protected]>

github-actions · 2025-11-21T08:16:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

…tribute Signed-off-by: LuminolT <[email protected]>

…dling for invalid hash algorithms and introduce xxhash support Signed-off-by: LuminolT <[email protected]>

ywang96

@LuminolT Very much appreciate the contribution, especially all the detailed analysis you put out in this PR!

cc @russellb regarding any security concerns.

ywang96 · 2025-11-21T08:38:15Z

benchmarks/benchmark_hash.py

Maybe it's worth mentioning these scripts in the documentation.

Thanks for pointing this out! It will be helpful for evaluating the hash algorithms used in prefix caching. I will update the relevant documentation accordingly.

Signed-off-by: LuminolT <[email protected]>

mergify · 2025-11-21T09:39:03Z

Documentation preview: https://vllm--29163.org.readthedocs.build/en/29163/

…ching Signed-off-by: LuminolT <[email protected]>

Signed-off-by: LuminolT <[email protected]>

russellb

First, I appreciate the PR, including the detailed PR description, including performance numbers.

Why xxHash? Prefix Caching does not require cryptographic security:

It never stores secret data.
It never leaves process boundaries.
It is used exclusively to identify structural equivalence of prefix blocks.

Therefore, cryptographic collision resistance is unnecessary.

I disagree with this premise.

Prefix cache contents are secret data in a multi-tenant environment. A hash collision would cause undefined behavior and, at worst, leak private information. Consider this scenario:

Company A sends a request
Company B sends a request that encounters a hash collision, causing Company B's request to use cache based on Company A's request.

We had one security vulnerability where it was possible to get this to occur predictably: GHSA-rm76-4mrf-v9r8

This is a bit different, but it is another example of prefix cache security concerns. We had another case where someone demonstrated how to infer information about the contents of the prefix cache by using a timing attack to detect prefix cache hits. See this paper: https://arxiv.org/html/2411.18191v1, this PR: #17045, and this advisory: GHSA-4qjh-9fv9-r85r.

I'm not opposed to supporting this if it demonstrates a meaningful performance benefit, but I would also like to enhance our help text for the configuration to say that choosing something other than our current default of SHA256 is theoretically less secure in a multi-tenant environment and has a slightly increased risk of undefined behavior from hash collisions otherwise. Further, the code should include an additional comment clarifying the importance of the current default. I can post a separate PR adding this text, but I'd prefer it to be in place before we add other algorithms, as the tradeoffs are important.

NickLucche · 2025-11-21T12:53:15Z

Good point, I suppose an experimental flag should've been the prerequisite for this kind of security trade-off changes in any case.

russellb · 2025-11-21T12:54:14Z

requirements/common.txt

 ninja # Required for xgrammar, rocm, tpu, xpu
 pybase64 # fast base64 implementation
 cbor2 # Required for cross-language serialization of hashable objects
+xxhash # Required for fast hashing for prefix caching


I would prefer this not be included in our main requirements file, given that it's optional. In fact, it MUST remain optional to avoid getting flagged in environments with strict security controls.

russellb · 2025-11-21T12:55:04Z

vllm/utils/hashing.py

 import cbor2

+try:
+    import xxhash as _xxhash


It would be helpful to add a comment here that it is important for this to remain an optional dependency, as it would be considered problematic to include at all in environments with strict security controls.

Suggested change

import xxhash as _xxhash

# It is important that this remains an optional dependency.

# It would not be allowed in environments with strict security controls,

# so it's best not to have it installed when not in use.

import xxhash as _xxhash

russellb

Instead of a separate PR, I just included some suggested text additions here.

russellb · 2025-11-21T12:57:28Z

requirements/common.txt

 ninja # Required for xgrammar, rocm, tpu, xpu
 pybase64 # fast base64 implementation
 cbor2 # Required for cross-language serialization of hashable objects
+xxhash # Required for fast hashing for prefix caching


Suggested change

xxhash # Required for fast hashing for prefix caching

russellb · 2025-11-21T12:59:25Z

vllm/utils/hashing.py

 import cbor2

+try:
+    import xxhash as _xxhash


Suggested change

import xxhash as _xxhash

# It is important that this remains an optional dependency.

# It would not be allowed in environments with strict security controls,

# so it's best not to have it installed when not in use.

import xxhash as _xxhash

russellb · 2025-11-21T13:04:54Z

vllm/config/cache.py

+    - "xxhash" uses Pickle serialization with xxHash (128-bit) for faster,
+    non-cryptographic hashing. Requires the optional ``xxhash`` package.\n
+    - "xxhash_cbor" combines canonical CBOR serialization with xxHash for
+    reproducible hashing. Requires the optional ``xxhash`` package."""


Suggested change

reproducible hashing. Requires the optional ``xxhash`` package."""

"""Set the hash algorithm for prefix caching:\n

- "sha256" uses Pickle for object serialization before hashing. This is the

current default, as SHA256 is the most secure choice to avoid potential

hash collisions.\n

- "sha256_cbor" provides a reproducible, cross-language compatible hash. It

serializes objects using canonical CBOR and hashes them with SHA-256.\n

- "xxhash" uses Pickle serialization with xxHash (128-bit) for faster,

non-cryptographic hashing. Requires the optional ``xxhash`` package.

IMPORTANT: Use of a hashing algorithm that is not considered

cryptographically secure theoretically increases the risk of hash collisions,

which can cause undefined behavior or even leak private information in

multi-tenant environments. Even if collisions are still very unlikely, it is

important to consider your security risk tolerance against the performance

benefits before turning this on.\n

- "xxhash_cbor" combines canonical CBOR serialization with xxHash for

reproducible hashing. Requires the optional ``xxhash`` package."""

LuminolT requested review from ApostaC, ProExpertProg, WoosukKwon, alexm-redhat, heheda12345, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners November 21, 2025 07:51

mergify bot added performance Performance-related issues v1 labels Nov 21, 2025

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

tests/v1/engine/test_engine_args.py Show resolved Hide resolved

vllm/utils/hashing.py Show resolved Hide resolved

LuminolT force-pushed the dev-xxhash branch from c3edf24 to 33660b0 Compare November 21, 2025 07:58

LuminolT requested review from DarkLight1337, NickLucche, aarnphm, bigPYJ1151, jeejeelee, jikunshang, noooop, patrickvonplaten, pavanimajety, russellb, sighingnow and tjtanaa as code owners November 21, 2025 07:58

mergify bot added rocm Related to AMD ROCm structured-output labels Nov 21, 2025

github-project-automation bot added this to NVIDIA and gpt-oss Issues & Enhancements Nov 21, 2025

mergify bot added the speculative-decoding label Nov 21, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 21, 2025

github-project-automation bot added this to Structured Output Nov 21, 2025

mergify bot added tpu Related to Google TPUs kv-connector labels Nov 21, 2025

mergify bot added the needs-rebase label Nov 21, 2025

LuminolT added 2 commits November 21, 2025 16:05

[FEAT] Add xxhash and xxhash_cbor algorithms for prefix caching and u…

37a4069

…pdate tests Signed-off-by: LuminolT <[email protected]>

[FEAT] Add micro benchmark for comparing built-in hash(), SHA-256, an…

c6ae405

…d xxHash from [23673](vllm-project#23673) Signed-off-by: LuminolT <[email protected]>

LuminolT force-pushed the dev-xxhash branch from 33660b0 to c6ae405 Compare November 21, 2025 08:06

mergify bot removed tpu Related to Google TPUs needs-rebase labels Nov 21, 2025

LuminolT added 2 commits November 21, 2025 16:20

[FEAT] Ensure xxhash compatibility by checking for xxh3_128_digest at…

4e231a7

…tribute Signed-off-by: LuminolT <[email protected]>

[FEAT] Enhance CLI argument parsing for prefix caching: add error han…

cefd2f6

…dling for invalid hash algorithms and introduce xxhash support Signed-off-by: LuminolT <[email protected]>

ywang96 reviewed Nov 21, 2025

View reviewed changes

[Chore] Fix pre-commit formatting for hash benchmarks

4af648c

Signed-off-by: LuminolT <[email protected]>

LuminolT added 2 commits November 21, 2025 17:40

[Docs] Update benchmark documents for hashing algorithms of prefix ca…

8b0f2f5

…ching Signed-off-by: LuminolT <[email protected]>

[Feat] Add xxhash dependency for fast hashing in prefix caching

19d6402

Signed-off-by: LuminolT <[email protected]>

LuminolT force-pushed the dev-xxhash branch from 8c7d7d7 to 19d6402 Compare November 21, 2025 09:41

russellb requested changes Nov 21, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Nov 21, 2025

russellb requested changes Nov 21, 2025

View reviewed changes

-    import xxhash as _xxhash
+    # It is important that this remains an optional dependency.
+    # It would not be allowed in environments with strict security controls,
+    # so it's best not to have it installed when not in use.
+    import xxhash as _xxhash

-    reproducible hashing. Requires the optional ``xxhash`` package."""
+"""Set the hash algorithm for prefix caching:\n
+   - "sha256" uses Pickle for object serialization before hashing. This is the
+   current default, as SHA256 is the most secure choice to avoid potential
+   hash collisions.\n
+   - "sha256_cbor" provides a reproducible, cross-language compatible hash. It
+   serializes objects using canonical CBOR and hashes them with SHA-256.\n
+   - "xxhash" uses Pickle serialization with xxHash (128-bit) for faster,
+   non-cryptographic hashing. Requires the optional ``xxhash`` package.
+   IMPORTANT: Use of a hashing algorithm that is not considered
+   cryptographically secure theoretically increases the risk of hash collisions,
+   which can cause undefined behavior or even leak private information in
+   multi-tenant environments. Even if collisions are still very unlikely, it is
+   important to consider your security risk tolerance against the performance
+   benefits before turning this on.\n
+   - "xxhash_cbor" combines canonical CBOR serialization with xxHash for
+   reproducible hashing. Requires the optional ``xxhash`` package."""

Uh oh!

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching #29163

Are you sure you want to change the base?

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching #29163

Conversation

LuminolT commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

1. Micro benchmark of hash functions

2. Block-level hash throughput

3. Prefix caching evaluation

4. Long-running throughput & GC behavior

Test Result

1. Micro benchmark of hash functions

2. Block-level hash throughput

3. Prefix caching evaluation

4. Long-running throughput & GC behavior

output_len = 1 (prefill-dominant)

output_len = 150 (decode-heavy)

CBOR variants

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Nov 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LuminolT commented Nov 21, 2025 •

edited by github-actions bot

Loading