Skip to content

Commit 76f90f5

Browse files
committed
[Fix] fix some error
Signed-off-by: Csrayz <[email protected]>
1 parent 72aeaec commit 76f90f5

File tree

3 files changed

+6
-4
lines changed

3 files changed

+6
-4
lines changed

docs/source/user_guide/configuration/additional_config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The details of each config option are as follows:
5555
| ---- | ---- | ------- | ----------- |
5656
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
5757
| `max_long_partial_prefills` | Union[int, float] | `float('inf')` | the maximum number of prompts longer than long_prefill_token_threshold that will be prefilled concurrently. |
58-
| `long_prefill_token_threshold` | Union[int, float] | `False` | a request is considered long if the prompt is longer than this number of tokens. |
58+
| `long_prefill_token_threshold` | Union[int, float] | `float('inf')` | a request is considered long if the prompt is longer than this number of tokens. |
5959

6060
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `enable_chunked_prefill: True` to ascend_scheduler_config as well.
6161

vllm_ascend/core/schedule_config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def __post_init__(self) -> None:
7676
else:
7777
if self.long_prefill_token_threshold is None:
7878
self.long_prefill_token_threshold = \
79-
int(self.max_model_len * 0.04)
79+
max(1, int(self.max_model_len * 0.04))
8080

8181
assert (self.max_long_partial_prefills > 0)
8282
assert (self.long_prefill_token_threshold > 0)

vllm_ascend/core/scheduler.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ def schedule(self) -> SchedulerOutput:
8888
# Skip long prompt requests in prefill stage.
8989
# long_prefill_budget is float('inf') if not use.
9090
long_prefill_budget = self.vllm_config.scheduler_config.max_long_partial_prefills
91+
long_prefill_token_threshold = self.vllm_config.scheduler_config.long_prefill_token_threshold
9192

9293
# Schedule prefill requests first.
9394
while self.waiting and token_budget > 0:
@@ -187,7 +188,7 @@ def skip_cur_request():
187188
skip_cur_request()
188189
continue
189190

190-
if num_new_tokens > self.vllm_config.scheduler_config.long_prefill_token_threshold \
191+
if num_new_tokens > long_prefill_token_threshold \
191192
and long_prefill_budget <= 0:
192193
skip_cur_request()
193194
continue
@@ -244,7 +245,8 @@ def skip_cur_request():
244245
# Update request info.
245246
num_scheduled_tokens[request.request_id] = num_new_tokens
246247
token_budget -= num_new_tokens
247-
long_prefill_budget -= 1
248+
if num_new_tokens > long_prefill_token_threshold:
249+
long_prefill_budget -= 1
248250
request.status = RequestStatus.RUNNING
249251
request.num_computed_tokens = num_computed_tokens
250252
# Count the number of prefix cached tokens.

0 commit comments

Comments
 (0)