Skip to content

Commit 64da65b

Browse files
authored
Prefix Caching- fix t4 triton error (#2517)
1 parent 5255d99 commit 64da65b

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/model_executor/layers/triton_kernel/prefix_prefill.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -618,7 +618,9 @@ def context_attention_fwd(q,
618618
b_ctx_len,
619619
max_input_len,
620620
alibi_slopes=None):
621-
BLOCK = 128
621+
622+
cap = torch.cuda.get_device_capability()
623+
BLOCK = 128 if cap[0] >= 8 else 64
622624
# shape constraints
623625
Lq, Lk, Lv = q.shape[-1], k.shape[-1], v.shape[-1]
624626
assert Lq == Lk and Lk == Lv

0 commit comments

Comments
 (0)