Skip to content

Commit 8c742a6

Browse files
authored
[Misc] Avoid redundant copy for encoder-only models (#24012)
Signed-off-by: Woosuk Kwon <[email protected]>
1 parent 183a709 commit 8c742a6

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -827,13 +827,13 @@ def _prepare_inputs(
827827
blk_table_tensor = torch.zeros(
828828
(num_reqs, 1),
829829
dtype=torch.int32,
830-
pin_memory=self.pin_memory,
831-
device="cpu").to(self.device, non_blocking=True)
832-
slot_mapping = torch.zeros((total_num_scheduled_tokens, ),
833-
dtype=torch.int32,
834-
pin_memory=self.pin_memory,
835-
device="cpu").to(self.device,
836-
non_blocking=True)
830+
device=self.device,
831+
)
832+
slot_mapping = torch.zeros(
833+
(total_num_scheduled_tokens, ),
834+
dtype=torch.int64,
835+
device=self.device,
836+
)
837837
num_common_prefix_blocks = 0
838838
else:
839839
blk_table = self.input_batch.block_table[kv_cache_group_id]

0 commit comments

Comments
 (0)