Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit 48c0cb4

Browse files
authored
With chunked prefil, for large prompts, the sampler can encounter a zero-sized tensor, on which skinny gemm fails (#204)
1 parent 57ea101 commit 48c0cb4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/model_executor/layers/tuned_gemm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def apply_skinny(self, m, n, k, inp_view, weights):
6262
return None
6363
if inp_view.dtype != torch.float16 or k % 8 != 0:
6464
return None
65-
if m > 8 and n <= 4:
65+
if m > 8 and 0 < n <= 4:
6666
out = torch.empty(inp_view.shape[0],
6767
weights.shape[0],
6868
dtype=inp_view.dtype,

0 commit comments

Comments
 (0)