Skip to content

Commit 56f697b

Browse files
authored
[None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
1 parent bc833d3 commit 56f697b

File tree

2 files changed

+14
-1
lines changed

2 files changed

+14
-1
lines changed

cpp/kernels/fmha_v2/setup.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6379,6 +6379,16 @@ def enumerate_kernels():
63796379
and kspec.version == 2
63806380
and kspec.cross_mha == False
63816381
and kspec.flash_attention == False)
6382+
# Clip/SigLip support.
6383+
or (kspec.sm == 100
6384+
and kspec.dtype in ['fp16', 'bf16', 'fp16_fp32', 'e4m3', 'e4m3_fp32']
6385+
and kspec.head_size == 80
6386+
and kspec.head_size_v == 0
6387+
and kspec.sage_block_sizes is None
6388+
and kspec.version == 2
6389+
and kspec.cross_mha == False
6390+
and kspec.flash_attention == True
6391+
and kspec.input_layout != InputLayout.SEPARATE_Q_K_V)
63826392
# Deepseek MLA (generation 576/512 paged)
63836393
or (kspec.sm in [90, 100, 120]
63846394
and kspec.dtype in ['bf16', 'e4m3_fp32']

cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ QkvLayout AttentionInputLayoutToQkvLayout(AttentionInputLayout layout)
4646

4747
FmhaDispatcher::FmhaDispatcher(MHARunnerFixedParams fixedParams)
4848
: mFixedParams(fixedParams)
49-
, mUseTllmGen(tensorrt_llm::common::isSM100Family())
49+
// TRTLLM-GEN only supports power of 2 head sizes.
50+
// The exception will fall back to fmha v2.
51+
// Please update fmha_v2/setup.py if you want to add more supported head sizes.
52+
, mUseTllmGen(tensorrt_llm::common::isSM100Family() && fixedParams.headSize != 80)
5053
{
5154
if (mUseTllmGen)
5255
{

0 commit comments

Comments
 (0)