Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions vllm/v1/engine/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,14 +177,17 @@ def __init__(
self.vllm_config.cache_config.enable_prefix_caching
or self.scheduler.get_kv_connector() is not None
):
block_size = vllm_config.cache_config.block_size
hash_block_size = (
vllm_config.cache_config.block_size
* vllm_config.parallel_config.decode_context_parallel_size
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This calculation for hash_block_size duplicates logic from the Scheduler's __init__ method, where its block_size attribute is also adjusted for decode_context_parallel_size. This duplication is what led to the original bug this PR is fixing.

To avoid this, you can reuse the block_size from the scheduler instance. This makes the code more robust by using a single source of truth.

While block_size is not on the SchedulerInterface, using it with a type: ignore is a pragmatic way to remove the duplication within this file. A more complete solution would involve exposing the logical block size through the interface or a shared utility, which could be addressed in a follow-up.

            hash_block_size = self.scheduler.block_size  # type: ignore

caching_hash_fn = get_hash_fn_by_name(
vllm_config.cache_config.prefix_caching_hash_algo
)
init_none_hash(caching_hash_fn)

self.request_block_hasher = get_request_block_hasher(
block_size, caching_hash_fn
hash_block_size, caching_hash_fn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid scaling block hashes when only KV connectors use them

The new hash setup multiplies cache_config.block_size by decode_context_parallel_size before computing request.block_hashes. That makes each hash correspond to block_size × dcp_world_size tokens (engine/core.py lines 175‑190). KV connectors, however, still interpret request.block_hashes in units of the original GPU block size; for example OffloadingConnectorScheduler.get_num_new_matched_tokens asserts len(request.block_hashes) // self.block_size_factor == num_blocks where self.block_size_factor is derived from cache_config.block_size (offloading_connector.py lines 179‑181 and spec.py lines 33‑38). With DCP>1 and a KV connector enabled but prefix caching disabled, len(request.block_hashes) shrinks by the DCP factor and the assertion will fail or the connector will mis-index blocks. Either keep hashes at the GPU block granularity when only a connector is present or update the connector code to handle the larger hash stride.

Useful? React with 👍 / 👎.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid scaling block hashes when only KV connectors use them

The new hash setup multiplies cache_config.block_size by decode_context_parallel_size before computing request.block_hashes. That makes each hash correspond to block_size × dcp_world_size tokens (engine/core.py lines 175‑190). KV connectors, however, still interpret request.block_hashes in units of the original GPU block size; for example OffloadingConnectorScheduler.get_num_new_matched_tokens asserts len(request.block_hashes) // self.block_size_factor == num_blocks where self.block_size_factor is derived from cache_config.block_size (offloading_connector.py lines 179‑181 and spec.py lines 33‑38). With DCP>1 and a KV connector enabled but prefix caching disabled, len(request.block_hashes) shrinks by the DCP factor and the assertion will fail or the connector will mis-index blocks. Either keep hashes at the GPU block granularity when only a connector is present or update the connector code to handle the larger hash stride.

Useful? React with 👍 / 👎.

)

self.step_fn = (
Expand Down