Skip to content

Commit 78f2cbb

Browse files
achartierdominicshanshan
authored andcommitted
[None][fix] Disable DeepGEMM for Qwen3 MoE Attention layers (NVIDIA#8087)
Signed-off-by: Aurelien Chartier <[email protected]>
1 parent 19421f4 commit 78f2cbb

File tree

2 files changed

+3
-0
lines changed

2 files changed

+3
-0
lines changed

tensorrt_llm/_torch/models/modeling_qwen3.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ def __init__(
3434
fuse_qk_norm_rope: bool = True,
3535
attn_output_gate: bool = False,
3636
use_gemma_rms_norm: bool = False,
37+
disable_deep_gemm: bool = False,
3738
):
3839
config = model_config.pretrained_config
3940
self.pretrained_config = config
@@ -71,6 +72,7 @@ def __init__(
7172
config=model_config,
7273
attn_output_gate=self.attn_output_gate,
7374
use_gemma_rms_norm=use_gemma_rms_norm,
75+
disable_deep_gemm=disable_deep_gemm,
7476
)
7577

7678

tensorrt_llm/_torch/models/modeling_qwen3_moe.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ def __init__(self, model_config: ModelConfig[Qwen3MoeConfig],
167167
self.self_attn = Qwen3Attention(
168168
model_config,
169169
layer_idx=layer_idx,
170+
disable_deep_gemm=True,
170171
)
171172
self.mapping = model_config.mapping
172173
self.enable_attention_dp = self.mapping.enable_attention_dp

0 commit comments

Comments
 (0)