You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user_guide/configuration/additional_config.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,6 +54,8 @@ The details of each config option are as follows:
54
54
| Name | Type | Default | Description |
55
55
| ---- | ---- | ------- | ----------- |
56
56
|`enabled`| bool |`False`| Whether to enable ascend scheduler for V1 engine|
57
+
|`max_long_partial_prefills`| Union[int, float]|`float('inf')`| the maximum number of prompts longer than long_prefill_token_threshold that will be prefilled concurrently. |
58
+
|`long_prefill_token_threshold`| Union[int, float]|`False`| a request is considered long if the prompt is longer than this number of tokens. |
57
59
58
60
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `enable_chunked_prefill: True` to ascend_scheduler_config as well.
59
61
@@ -74,6 +76,8 @@ An example of additional configuration is as follows:
0 commit comments