Skip to content

Commit 7bbc7a0

Browse files
committed
[bugfix]fixed block_size incorrect setting issue in dsv3.2 (vllm-project#7630)
### What this PR does / why we need it? vllm-project/vllm#35122 This PR in the vllm community refactors the update mode of block_size. As a result, when the user does not specify `--block-size`, dsv3.2 obtains an incorrect block_size. **The root cause of the problem is analyzed from the block_size update process as follows:** 1. In NPUPlatform, `check_and_update_config` calls `refresh_block_size` to set block_size to 128. 2. During Modelrunner initialization, the `self.block_size` parameter is generated. At this time, block_size is still 128. This parameter will be used for operations such as kvcache initialization. 3. `update_block_size_for_backend` updates block_size to the size set in attn_backend. The reason why the DSV3.2 is faulty is that it has an additional attn_backend `DeepseekV32IndexerBackend`, and this backend is not rewritten. The block_size obtained from attn_backend is 64. In this case, only `vllm_config.cache_config.block_size` is updated, and other parts are not modified. As a result, the block_size on the entire network is inconsistent. **Modification solution:** Skip `update_block_size_for_backend` and modify block_size only in the `check_and_update_config` method. In the future, the block_size update logic can be migrated to the `update_block_size_for_backend` method. Ensure that all block_size values on the entire network are updated. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.18.0 - vLLM main: vllm-project/vllm@ed359c4 --------- Signed-off-by: Wang Kunpeng <1289706727@qq.com>
1 parent dba34d4 commit 7bbc7a0

File tree

2 files changed

+9
-17
lines changed

2 files changed

+9
-17
lines changed

vllm_ascend/platform.py

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -224,17 +224,9 @@ def inference_mode(cls):
224224

225225
@classmethod
226226
def update_block_size_for_backend(cls, vllm_config: VllmConfig) -> None:
227-
cache_config = vllm_config.cache_config
228-
if cache_config.user_specified_block_size:
229-
# User specified --block-size; keep it.
230-
return
231-
model_config = vllm_config.model_config
232-
if model_config is not None and model_config.is_hybrid:
233-
# Hybrid attention+mamba models rely on the model-specific sizing
234-
# logic rather than the generic platform default.
235-
return
236-
237-
super().update_block_size_for_backend(vllm_config)
227+
# TODO: NPU still sets block_size in check_and_update_config.
228+
# Move that logic here so block_size is chosen by the backend.
229+
pass
238230

239231
@classmethod
240232
def set_device(cls, device: torch.device):

vllm_ascend/utils.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1101,12 +1101,12 @@ def refresh_block_size(vllm_config):
11011101
if not scheduler_config or not model_config:
11021102
return
11031103

1104-
# TODO(MengqingCao): Remove the model_type check, after resolving the hidden error in get_kv_cache_groups.
1105-
if (
1106-
"qwen3_next" not in model_config.hf_text_config.model_type
1107-
and "qwen3_5" not in model_config.hf_text_config.model_type
1108-
and cache_config.block_size != 128
1109-
):
1104+
if model_config.is_hybrid:
1105+
# Hybrid attention+mamba models rely on the model-specific sizing
1106+
# logic rather than the generic platform default.
1107+
return
1108+
1109+
if cache_config.block_size != 128:
11101110
if cache_config.enable_prefix_caching or scheduler_config.enable_chunked_prefill:
11111111
logger.info("Block size is set to 128 if prefix cache or chunked prefill is enabled.")
11121112
cache_config.block_size = 128

0 commit comments

Comments
 (0)