[bugfix] ascend schedule encountered an incorrect req block length in… (#2394)

liziyu179 · web-flow · commit dd00969beb5b · 2025-08-16T18:32:29.000+08:00
… the check_watermark_for_prefill function ### What this PR does / why we need it? ascend schedule encountered an incorrect req block length in the check_watermark_for_prefill function，under the current writing method, it will always be 1. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? before： http://image.huawei.com/tiny-lts/v1/images/mdstorm/c6cff7cf33d500a3833f5f80352df373_1183x377.png after： http://image.huawei.com/tiny-lts/v1/images/mdstorm/57207a490d8ac0a70fc87dd08d02dee6_1470x954.png Signed-off-by: liziyu <liziyu16@huawei.com>
diff --git a/docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md b/docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md
@@ -173,7 +173,7 @@ In the PD separation scenario, we provide a recommended optimized configuration.
 - **prefiller node**
 
 1. set HCCL_BUFFSIZE=256
-2. add '--enforce-eager' commond to 'vllm serve'
+2. add '--enforce-eager' command to 'vllm serve'
 3. Take '--additional-config' as follow
 
 ```shell
@@ -231,7 +231,7 @@ python load_balance_proxy_server_example.py \
 ```
 
 :::{note}
-Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'.
+Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and their ports increase sequentially starting from '**engine_port**'.
 :::
 
 You can get the proxy program in the repository's examples, [load\_balance\_proxy\_server\_example.py](https://github.com/vllm-project/vllm-ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)
diff --git a/vllm_ascend/core/scheduler.py b/vllm_ascend/core/scheduler.py
@@ -433,7 +433,7 @@ def _check_watermark_for_prefill(self,
                                    self.block_size)
         req_blocks = self.kv_cache_manager.coordinator.get_blocks(
             request.request_id)
-        num_new_blocks = (num_required_blocks - len(req_blocks) -
+        num_new_blocks = (num_required_blocks - len(req_blocks[0]) -
                           len(computed_blocks))
         num_evictable_computed_blocks = sum(1 for blk in computed_blocks
                                             if blk.ref_cnt == 0)