Skip to content

Commit 39cd09d

Browse files
zyongyemgoin
andauthored
[Bugfix] use flash attn on sm90 (#22933)
Signed-off-by: Yongye Zhu <[email protected]> Co-authored-by: Michael Goin <[email protected]>
1 parent 919234f commit 39cd09d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/platforms/cuda.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@ def get_attn_backend_cls(cls, selected_backend, head_size, dtype,
316316

317317
# FlashAttention is the default for SM 8.0+ GPUs
318318
if cls.has_device_capability(80):
319-
if has_sink:
319+
if has_sink and not cls.is_device_capability(90):
320320
logger.info_once("Using Triton backend on V1 engine.")
321321
return TRITON_ATTN_VLLM_V1
322322
if is_default_backend_supported := is_attn_backend_supported(

0 commit comments

Comments
 (0)