Skip to content

Commit 5a99c97

Browse files
authored
[TRTLLM-8777][feat] Update DeepGEMM to the latest commit to include optimizations for DeepSeek-v3.2 (#9380)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
1 parent 786d308 commit 5a99c97

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

3rdparty/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ FetchContent_Declare(
3939
FetchContent_Declare(
4040
deepgemm
4141
GIT_REPOSITORY https://github.com/ruoqianguo/DeepGEMM
42-
GIT_TAG 9fa5965e265e27995f539e0dd73a06351a8a9eaf
42+
GIT_TAG 6cb8161516302550785d9af924d2778afef1f3f6 # swapab_sm100 branch
4343
GIT_SUBMODULES_RECURSE
4444
ON
4545
SOURCE_SUBDIR

tests/unittest/_torch/attention/sparse/test_dsa_indexer.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -308,9 +308,9 @@ def test_deepgemm_fp8_mqa_logits_basic():
308308
"""
309309
torch.manual_seed(0)
310310

311-
num_heads, head_dim = 32, 128
312-
seq_len = 512
313-
seq_len_kv = 1024
311+
num_heads, head_dim = 64, 128
312+
seq_len = 2048
313+
seq_len_kv = 4096
314314
#[seq_len, num_heads, head_dim]
315315
q = torch.randn(
316316
seq_len,
@@ -335,8 +335,8 @@ def test_deepgemm_fp8_mqa_logits_basic():
335335
)
336336
# ks[i] -> ke[i] for each q[i]
337337
ks = torch.zeros(seq_len, dtype=torch.int, device="cuda")
338-
ke = torch.arange(seq_len, dtype=torch.int, device="cuda") + (
339-
seq_len_kv - seq_len) + 1 # +1 for exclusive end
338+
ke = torch.arange(seq_len, dtype=torch.int,
339+
device="cuda") + (seq_len_kv - seq_len)
340340

341341
# Convert to FP8
342342
q_fp8 = q.to(torch.float8_e4m3fn)

0 commit comments

Comments
 (0)