Skip to content

Commit 5efd3af

Browse files
committed
[https://nvbugs/5740075][fix] Fix sm120 speculation
Signed-off-by: Mike Iovine <miovine@nvidia.com>
1 parent bd13957 commit 5efd3af

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

tensorrt_llm/_torch/speculative/interface.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,9 @@ def extend_ctx(self, attention_backend: Type[AttentionBackend]):
136136
# 1-model has separate logic for handling draft tokens
137137
return False
138138

139+
xqa_supported = get_sm_version() in (90, 100)
139140
return not issubclass(attention_backend,
140-
TrtllmAttention) or get_sm_version() < 90
141+
TrtllmAttention) or not xqa_supported
141142

142143
def attention_need_spec_dec_mode(
143144
self,

0 commit comments

Comments
 (0)