Skip to content

Commit 72aeaec

Browse files
committed
[Doc] new ascend_scheduler_config
Signed-off-by: Csrayz <[email protected]>
1 parent 3ca22f7 commit 72aeaec

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

docs/source/user_guide/configuration/additional_config.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ The details of each config option are as follows:
5454
| Name | Type | Default | Description |
5555
| ---- | ---- | ------- | ----------- |
5656
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
57+
| `max_long_partial_prefills` | Union[int, float] | `float('inf')` | the maximum number of prompts longer than long_prefill_token_threshold that will be prefilled concurrently. |
58+
| `long_prefill_token_threshold` | Union[int, float] | `False` | a request is considered long if the prompt is longer than this number of tokens. |
5759

5860
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `enable_chunked_prefill: True` to ascend_scheduler_config as well.
5961

@@ -74,6 +76,8 @@ An example of additional configuration is as follows:
7476
"ascend_scheduler_config": {
7577
"enabled": True,
7678
"enable_chunked_prefill": True,
79+
"max_long_partial_prefills": 1,
80+
"long_prefill_token_threshold": 4096,
7781
},
7882
"refresh": False,
7983
}

0 commit comments

Comments
 (0)