Skip to content

Commit 35afe1b

Browse files
Pradyun92Pradyun RamadoraiNickLucche
authored
[BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D (#22317)
Signed-off-by: Pradyun Ramadorai <[email protected]> Signed-off-by: Pradyun92 <[email protected]> Co-authored-by: Pradyun Ramadorai <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]>
1 parent 81c57f6 commit 35afe1b

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

vllm/v1/core/sched/scheduler.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -437,14 +437,24 @@ def schedule(self) -> SchedulerOutput:
437437
# The request cannot be scheduled.
438438
break
439439

440+
# Handles an edge case when P/D Disaggregation
441+
# is used with Spec Decoding where an
442+
# extra block gets allocated which
443+
# creates a mismatch between the number
444+
# of local and remote blocks.
445+
effective_lookahead_tokens = (0 if request.num_computed_tokens
446+
== 0 else
447+
self.num_lookahead_tokens)
448+
440449
new_blocks = self.kv_cache_manager.allocate_slots(
441450
request,
442451
num_new_tokens + num_external_computed_tokens,
443452
num_new_local_computed_tokens,
444453
new_computed_blocks,
445-
num_lookahead_tokens=self.num_lookahead_tokens,
454+
num_lookahead_tokens=effective_lookahead_tokens,
446455
delay_cache_blocks=load_kv_async,
447456
)
457+
448458
if new_blocks is None:
449459
# The request cannot be scheduled.
450460
break

0 commit comments

Comments
 (0)