[Model Runner V2] Change bookkeeping logic in preparation for spec decoding #29194

WoosukKwon · 2025-11-21T18:38:49Z

This PR updates the input-preparation and bookkeeping logic for the V2 model runner. The primary goal is to shift as much work as possible onto the GPU, laying the groundwork for async scheduling + speculative decoding.

Key changes

num_computed_tokens is now GPU-only.
The CPU no longer tracks this value. Instead, it maintains num_computed_prefill_tokens, which is used to track the progress of chunked prefills.
Prefill and decode input preparation are now decoupled.
- Prefill: inputs and their statuses (e.g., num_computed_prefill_tokens) are read from NumPy arrays.
- Decode: uses last_sampled_tokens and num_computed_tokens directly from the GPU.
Preparation for speculative decoding.
We now maintain num_sampled_tokens as GPU tensors for upcoming spec-decode support (where the number of accepted tokens are dynamic).

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon · 2025-11-21T19:01:40Z

vllm/v1/worker/gpu/model_runner.py

+        # Get num_computed_tokens.
+        # HACK(woosuk): Here, we use num_computed_tokens on GPU instead of
+        # num_computed_tokens_cpu. This works for most cases.
+        num_computed_tokens = self.req_states.num_computed_tokens[idx_mapping]
+        # HACK(woosuk): Only GPU has the exact seq_lens because at this point
+        # CPU does not know how many draft tokens are accepted/rejected in the
+        # previous step. Therefore, we use max_model_len to be safe.
+        seq_lens_np = np.full(num_reqs, self.max_model_len, dtype=np.int32)


@LucasWilkinson This is my current hack, which is totally undesirable. I plan to use a tighter upper bound for seq_lens_np, but I'd like to keep it if the refactoring may happen in the near future.

WoosukKwon added 30 commits August 17, 2025 14:38

wip

33a3a26

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

699bd79

merge

c472982

Signed-off-by: Woosuk Kwon <[email protected]>

wip

79e5eb3

Signed-off-by: Woosuk Kwon <[email protected]>

rename

64c8cce

Signed-off-by: Woosuk Kwon <[email protected]>

merge

48bca9a

Signed-off-by: Woosuk Kwon <[email protected]>

wip

a1e3745

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

da9cd26

fix

7b4b72e

Signed-off-by: Woosuk Kwon <[email protected]>

merge

65f9369

Signed-off-by: Woosuk Kwon <[email protected]>

fix

b1d5273

Signed-off-by: Woosuk Kwon <[email protected]>

simplify

a851aaa

Signed-off-by: Woosuk Kwon <[email protected]>

merge

e570b0a

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

d6d719f

Merge branch 'main' into woosuk/input-prep

b21393c

minor

efba25e

Signed-off-by: Woosuk Kwon <[email protected]>

fix

e451045

Signed-off-by: Woosuk Kwon <[email protected]>

minor

19c0dfc

Signed-off-by: Woosuk Kwon <[email protected]>

minor

4055781

Signed-off-by: Woosuk Kwon <[email protected]>

fix

9ee9d0e

Signed-off-by: Woosuk Kwon <[email protected]>

merge

efcb786

Signed-off-by: Woosuk Kwon <[email protected]>

minor

e696f78

Signed-off-by: Woosuk Kwon <[email protected]>

optimize spec

c11d1e6

Signed-off-by: Woosuk Kwon <[email protected]>

work

22771e5

Signed-off-by: Woosuk Kwon <[email protected]>

MAX_SPEC_LEN

ba1a58f

Signed-off-by: Woosuk Kwon <[email protected]>

fix

62d23b3

Signed-off-by: Woosuk Kwon <[email protected]>

fix

af7b6c5

Signed-off-by: Woosuk Kwon <[email protected]>

fix

01bf16e

Signed-off-by: Woosuk Kwon <[email protected]>

top_p top_k

cc340e2

Signed-off-by: Woosuk Kwon <[email protected]>

merge

4c2a337

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added 20 commits November 16, 2025 11:14

revert

a9b4fa3

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/model-runner-v2

3da2e77

minor

ee2c3b0

Signed-off-by: Woosuk Kwon <[email protected]>

simplify get_kv_cache_spec

995f1aa

Signed-off-by: Woosuk Kwon <[email protected]>

support mla

ed84190

Signed-off-by: Woosuk Kwon <[email protected]>

merge

5ea5e7e

Signed-off-by: Woosuk Kwon <[email protected]>

preempt

784371c

Signed-off-by: Woosuk Kwon <[email protected]>

Optimize gumbel sampling

1402b93

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/model-runner-v2

d8b8e65

tmp

3306f84

Signed-off-by: Woosuk Kwon <[email protected]>

impl

015dd25

minor

d799827

opt

334c7d7

minor

fe389d5

Merge branch 'main' into woosuk/tmp-v2

1154803

minor

c0612da

Merge branch 'main' into woosuk/tmp-v2

a8430a7

rm spec_decode

32a0359

fix

540f456

fix

e3c16d0

mergify bot added nvidia v1 labels Nov 21, 2025

github-project-automation bot added this to NVIDIA Nov 21, 2025

WoosukKwon added 2 commits November 21, 2025 18:44

fix

f9ac765

postprocess

8422fe8

WoosukKwon commented Nov 21, 2025

View reviewed changes

WoosukKwon requested review from LucasWilkinson and njhill November 21, 2025 19:01

WoosukKwon added 2 commits November 21, 2025 19:29

fix

ebe5363

Merge branch 'main' into woosuk/tmp-v2

e2156fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model Runner V2] Change bookkeeping logic in preparation for spec decoding #29194

[Model Runner V2] Change bookkeeping logic in preparation for spec decoding #29194

WoosukKwon commented Nov 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

WoosukKwon Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Model Runner V2] Change bookkeeping logic in preparation for spec decoding #29194

Are you sure you want to change the base?

[Model Runner V2] Change bookkeeping logic in preparation for spec decoding #29194

Conversation

WoosukKwon commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changes

Uh oh!

WoosukKwon Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WoosukKwon commented Nov 21, 2025 •

edited by github-actions bot

Loading