Skip to content

Commit 4f4d427

Browse files
simon-momgoin
andauthored
Disable chunked prefill and/or prefix caching when MLA is enabled (#12642)
From @mgoin in #12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
1 parent 1e36983 commit 4f4d427

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

vllm/config.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3252,6 +3252,16 @@ def __post_init__(self):
32523252

32533253
current_platform.check_and_update_config(self)
32543254

3255+
# If MLA is enabled, force disable chunked prefill and prefix caching
3256+
if self.model_config and self.model_config.use_mla:
3257+
logger.info("MLA is enabled; forcing chunked prefill and prefix "
3258+
"caching to be disabled.")
3259+
self.scheduler_config.enable_chunked_prefill = False
3260+
self.scheduler_config.chunked_prefill_enabled = False
3261+
3262+
if self.cache_config is not None:
3263+
self.cache_config.enable_prefix_caching = False
3264+
32553265
if not self.instance_id:
32563266
self.instance_id = random_uuid()[:5]
32573267

0 commit comments

Comments
 (0)