Skip to content

Commit 17c1504

Browse files
SnowCharmQnjhill
authored andcommitted
[Perf] Optimize Preparing Inputs for GPU Model Runner (vllm-project#16484)
Signed-off-by: snowcharm <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Yang Wang <[email protected]>
1 parent 893f61d commit 17c1504

File tree

1 file changed

+4
-8
lines changed

1 file changed

+4
-8
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -484,14 +484,10 @@ def _prepare_inputs(
484484
self.input_batch.block_table.commit(num_reqs)
485485

486486
# Get the number of scheduled tokens for each request.
487-
# TODO: The Python loop can be slow. Optimize.
488-
num_scheduled_tokens = np.empty(num_reqs, dtype=np.int32)
489-
max_num_scheduled_tokens = 0
490-
for i, req_id in enumerate(self.input_batch.req_ids):
491-
num_tokens = scheduler_output.num_scheduled_tokens[req_id]
492-
num_scheduled_tokens[i] = num_tokens
493-
max_num_scheduled_tokens = max(max_num_scheduled_tokens,
494-
num_tokens)
487+
req_ids = self.input_batch.req_ids
488+
tokens = [scheduler_output.num_scheduled_tokens[i] for i in req_ids]
489+
num_scheduled_tokens = np.array(tokens, dtype=np.int32)
490+
max_num_scheduled_tokens = max(tokens)
495491

496492
# Get request indices.
497493
# E.g., [2, 5, 3] -> [0, 0, 1, 1, 1, 1, 1, 2, 2, 2]

0 commit comments

Comments
 (0)