-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching #29163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces xxHash as a high-performance, non-cryptographic hashing option for prefix caching, which is a sensible optimization. The implementation is well-structured and includes comprehensive benchmarks to validate the performance gains. My review focuses on improving the robustness of handling the new optional dependency, xxhash. I've suggested a change to ensure compatibility with older versions of the xxhash library and another to make the test suite resilient to its absence. Overall, this is a solid contribution.
|
This pull request has merge conflicts that must be resolved before it can be |
…pdate tests Signed-off-by: LuminolT <[email protected]>
…d xxHash from [23673](vllm-project#23673) Signed-off-by: LuminolT <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
…tribute Signed-off-by: LuminolT <[email protected]>
…dling for invalid hash algorithms and introduce xxhash support Signed-off-by: LuminolT <[email protected]>
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth mentioning these scripts in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! It will be helpful for evaluating the hash algorithms used in prefix caching. I will update the relevant documentation accordingly.
Signed-off-by: LuminolT <[email protected]>
|
Documentation preview: https://vllm--29163.org.readthedocs.build/en/29163/ |
…ching Signed-off-by: LuminolT <[email protected]>
Signed-off-by: LuminolT <[email protected]>
russellb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I appreciate the PR, including the detailed PR description, including performance numbers.
Why xxHash? Prefix Caching does not require cryptographic security:
It never stores secret data.
It never leaves process boundaries.
It is used exclusively to identify structural equivalence of prefix blocks.Therefore, cryptographic collision resistance is unnecessary.
I disagree with this premise.
Prefix cache contents are secret data in a multi-tenant environment. A hash collision would cause undefined behavior and, at worst, leak private information. Consider this scenario:
- Company A sends a request
- Company B sends a request that encounters a hash collision, causing Company B's request to use cache based on Company A's request.
We had one security vulnerability where it was possible to get this to occur predictably: GHSA-rm76-4mrf-v9r8
This is a bit different, but it is another example of prefix cache security concerns. We had another case where someone demonstrated how to infer information about the contents of the prefix cache by using a timing attack to detect prefix cache hits. See this paper: https://arxiv.org/html/2411.18191v1, this PR: #17045, and this advisory: GHSA-4qjh-9fv9-r85r.
I'm not opposed to supporting this if it demonstrates a meaningful performance benefit, but I would also like to enhance our help text for the configuration to say that choosing something other than our current default of SHA256 is theoretically less secure in a multi-tenant environment and has a slightly increased risk of undefined behavior from hash collisions otherwise. Further, the code should include an additional comment clarifying the importance of the current default. I can post a separate PR adding this text, but I'd prefer it to be in place before we add other algorithms, as the tradeoffs are important.
|
Good point, I suppose an experimental flag should've been the prerequisite for this kind of security trade-off changes in any case. |
| ninja # Required for xgrammar, rocm, tpu, xpu | ||
| pybase64 # fast base64 implementation | ||
| cbor2 # Required for cross-language serialization of hashable objects | ||
| xxhash # Required for fast hashing for prefix caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this not be included in our main requirements file, given that it's optional. In fact, it MUST remain optional to avoid getting flagged in environments with strict security controls.
| import cbor2 | ||
|
|
||
| try: | ||
| import xxhash as _xxhash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to add a comment here that it is important for this to remain an optional dependency, as it would be considered problematic to include at all in environments with strict security controls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import xxhash as _xxhash | |
| # It is important that this remains an optional dependency. | |
| # It would not be allowed in environments with strict security controls, | |
| # so it's best not to have it installed when not in use. | |
| import xxhash as _xxhash |
russellb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a separate PR, I just included some suggested text additions here.
| ninja # Required for xgrammar, rocm, tpu, xpu | ||
| pybase64 # fast base64 implementation | ||
| cbor2 # Required for cross-language serialization of hashable objects | ||
| xxhash # Required for fast hashing for prefix caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| xxhash # Required for fast hashing for prefix caching |
| import cbor2 | ||
|
|
||
| try: | ||
| import xxhash as _xxhash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import xxhash as _xxhash | |
| # It is important that this remains an optional dependency. | |
| # It would not be allowed in environments with strict security controls, | |
| # so it's best not to have it installed when not in use. | |
| import xxhash as _xxhash |
| - "xxhash" uses Pickle serialization with xxHash (128-bit) for faster, | ||
| non-cryptographic hashing. Requires the optional ``xxhash`` package.\n | ||
| - "xxhash_cbor" combines canonical CBOR serialization with xxHash for | ||
| reproducible hashing. Requires the optional ``xxhash`` package.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| reproducible hashing. Requires the optional ``xxhash`` package.""" | |
| """Set the hash algorithm for prefix caching:\n | |
| - "sha256" uses Pickle for object serialization before hashing. This is the | |
| current default, as SHA256 is the most secure choice to avoid potential | |
| hash collisions.\n | |
| - "sha256_cbor" provides a reproducible, cross-language compatible hash. It | |
| serializes objects using canonical CBOR and hashes them with SHA-256.\n | |
| - "xxhash" uses Pickle serialization with xxHash (128-bit) for faster, | |
| non-cryptographic hashing. Requires the optional ``xxhash`` package. | |
| IMPORTANT: Use of a hashing algorithm that is not considered | |
| cryptographically secure theoretically increases the risk of hash collisions, | |
| which can cause undefined behavior or even leak private information in | |
| multi-tenant environments. Even if collisions are still very unlikely, it is | |
| important to consider your security risk tolerance against the performance | |
| benefits before turning this on.\n | |
| - "xxhash_cbor" combines canonical CBOR serialization with xxHash for | |
| reproducible hashing. Requires the optional ``xxhash`` package.""" |
Purpose
This PR introduces xxHash-based hash algorithms to vLLM’s prefix caching, providing a faster and collision-resistant alternative to the SHA-256 variants added in #23673.
Why xxHash? Prefix Caching does not require cryptographic security:
Therefore, cryptographic collision resistance is unnecessary.
xxHash strikes the right balance: Much better collision resistance compared to Python builtin
hash(), which is salted and unstable across processes. Much faster than SHA-256 and SHA-256+CBOR used in current prefix caching.This change is lightweight: it only introduces the xxhash dependency and adds minimal adapter code to integrate xxHash into the existing hashing interface. We added two small benchmarking scripts to validate correctness and performance, but these can be removed in follow-up cleanup if needed.
As a result this PR adds xxHash as an optional high-performance hash. Users can opt in with:
Test Plan
1. Micro benchmark of hash functions
benchmarks/benchmark_hash.pyis added for benchmarks for hash functions. This could be removed if PR is merged.This evaluation is same as #23673, except for some configurations for xxHash.
This compares Python's builtin
hash(), sha256 and xxHash.2. Block-level hash throughput
benchmarks/benchmark_prefix_block_hash.pyis added for block-level hash throughput evaluation This could be removed if PR is merged.This test is model-agnostic, so it reflects performance on larger models as well, rather than only on the small OPT-125M model we evaluated following.
This measures token hashing throughput for:
sha256,sha256_cbor,xxhashandxxhash_cbor.3. Prefix caching evaluation
This is the benchmark currently provided by vLLM. We tested it using different hash algorithms and compared their results.
Following evaluations are conducted on one-card A100 server.
4. Long-running throughput & GC behavior
Same as #23673. @Jialin 's Jialin#4 is used for GC analysis. Since SHA-256 significantly reduces GC overhead, our goal is to match or further reduce GC time compared to SHA-256.
Input/Output configurations:
Each run generates logs:
Test Result
1. Micro benchmark of hash functions
👉 Among non-builtin hashers, xxHash is ~1.5× faster than SHA-256.
2. Block-level hash throughput
20k × 64 tokens and 50k × 16 tokens.
# python benchmarks/benchmark_prefix_block_hash.py --num-blocks 20000 --block-size 64 Benchmarking 4 algorithms on 20000 blocks (block size=64). sha256 avg: 0.044319s best: 0.043920s throughput: 29.14M tokens/s sha256_cbor avg: 0.195202s best: 0.194645s throughput: 6.58M tokens/s xxhash avg: 0.035225s best: 0.035059s throughput: 36.51M tokens/s xxhash_cbor avg: 0.184370s best: 0.183651s throughput: 6.97M tokens/s👉 xxHash improves block hashing throughput by 25–40%.
3. Prefix caching evaluation
👉 Performance improves 1–2% with xxHash.
We should note that our experiments were conducted only on the small OPT-125M model. In larger production service, where prefix-hit rates are substantially higher, xxHash is expected to show even greater performance advantages.
4. Long-running throughput & GC behavior
output_len = 1 (prefill-dominant)
GC:
output_len = 150 (decode-heavy)
GC:
CBOR variants
Similar behavior:
👉 xxHash does not increase GC pressure and may even slightly reduce tail events.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.