Skip to content

Commit 4b5bcf8

Browse files
ri938robirv938
andauthored
faster startup of vLLM (#982)
* update --------- Co-authored-by: Robert Irvine <[email protected]>
1 parent 852ef5b commit 4b5bcf8

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

vllm/model_executor/layers/attention.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -259,8 +259,9 @@ def __init__(
259259
self.is_neox_style = is_neox_style
260260

261261
# Create the cos and sin cache.
262-
inv_freq = 1.0 / (base**(torch.arange(0, rotary_dim, 2) / rotary_dim))
263-
t = torch.arange(max_position).float()
262+
inv_freq = 1.0 / (base**(
263+
torch.arange(0, rotary_dim, 2, device="cuda") / rotary_dim))
264+
t = torch.arange(max_position, device="cuda").float()
264265
freqs = torch.einsum("i,j -> ij", t, inv_freq.float())
265266
cos = freqs.cos()
266267
sin = freqs.sin()

0 commit comments

Comments
 (0)