Skip to content

Commit 30f522d

Browse files
authored
Fix dummy cache allocation (#574)
* Fix dummy cache allocation * Try mps device selecting * Rechain reloc
1 parent d6f4f80 commit 30f522d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/petals/server/throughput.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ def measure_compute_rps(
206206
block = block.to(dtype)
207207
block = convert_block(block, 0, config, tensor_parallel_devices, device, quant_type=quant_type, freeze=True)
208208

209-
cache = (DUMMY_KEY_PAST.to(dtype), DUMMY_KEY_PAST.to(dtype))
209+
cache = (DUMMY_KEY_PAST.to(dtype=dtype, device=device), DUMMY_KEY_PAST.to(dtype=dtype, device=device))
210210
elapsed = 0
211211
dummy_input = torch.randn(1, n_tokens, config.hidden_size, device=device, dtype=dtype)
212212

0 commit comments

Comments
 (0)