[pull] main from NVIDIA:main by pull[bot] · Pull Request #86 · LarryXFly/TensorRT-LLM

pull · 2025-05-12T09:47:29Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Replace sanity test for nemotron h with a correctness test * Add prefill+decode reference logprobs from initial implementation + batched forward test * Add testing that decode matches prefill - compare decode vs all prefilling the decoded tokens

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

…4428) * add test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* Fix TRTLLMSampler. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Added type hint. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* chore: Partition context requests in MicroBatchScheduler Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fixup! chore: Partition context requests in MicroBatchScheduler Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

…r and fix EarlyStopDecoder unsqueeze bug (#4290) * add bidirectional support and fix EarlyStopDecoder unsqueeze to be compatible with LogitsStorage Signed-off-by: Rohan Varma <rohanv@nvidia.com> * run pre-commit Signed-off-by: Rohan Varma <rohanv@nvidia.com> * instead of bidirectional flag use ModelConfig.is_generation Signed-off-by: Rohan Varma <rohanv@nvidia.com> * fix unit test to extract logits from correct dim Signed-off-by: Rohan Varma <rohanv@nvidia.com> --------- Signed-off-by: Rohan Varma <rohanv@nvidia.com>

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* waives closed bugs Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

add remaining 2 phi cpp perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

* update cubins Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * add chunked-attention kernels on blackwell Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> fix Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

) * add trtllm-bench mgmn test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* v1.5 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> v1.5.4 Add back draft_overhead to spec dec stats Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v1.5.5: fix CI error Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.6: fix CI error 8196 > 8192 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * precommit run Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v2.0: Address reviewer concerns Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v2.1: add fix from wili Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Revert changes that require use of TypeAlias because that requires python version >= 3.10 Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> --------- Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>

waive torch compile test cases of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* feat: add deepgemm_swapab feat: add fp8_gemm_kernel_swapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> feat: set threshold for deepgemm and deepgemmswapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * docs: update README.md Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * fix: std::runtime_error needs #include <stdexcept> Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * chores: remove the redundant code Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * feat: support for dense deep_gemm swapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * chores: remove redundant code Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> --------- Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>

* unwaive some disagg test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * pytest.mark.skip_less_device(4) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

clean codes Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add llama 3.3 70b 2 nodes tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * remove enable_overlap_scheduler parameter Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

waive test_fp8_block_scales_4gpus of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

…_chunked prefill to be true for isl>2048 cases (#4285) 1.remove enable_overlap_schedule in pytorch config 2.rename model_yaml_config.py to pytorch_model_config.py and set enable_chunked_prefill to be true for cases with isl>2048 Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>

…r perf test (#4527) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

#4478) * update sanity test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

#4495) refactor fused_moe for redundant expert Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* chore: Improve formatting of DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Typed InstanceRole param in DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Skip DisaggExecutorTest based on device count Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

…sting (#4892) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

Signed-off-by: Shiyu Li <shili@nvidia.com>

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

…Implementation of Large-scale EP) (#4958) Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

… main… (#4960) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

tomeras91 and others added 30 commits May 20, 2025 17:55

[Docs] - Add date and commit info (#4448)

bc6a69e

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

fix: replace the image links in the blog (#4489)

a98e7ea

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

Build Triton for arm (#4456)

e4fa856

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

test(perf): Add remaining Phi-4-mini-instruct perf tests (#4443)

9a8c3ec

add remaining 2 phi cpp perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

feat: conditional disaggregation in disagg server (#3974)

77a0189

perf: Fuse gemm setup function for SM90/SM100 MOE plugin path (#4146)

a030a89

Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>

fix: skip weights defined in create_weights for pp. (#4447)

62c16b6

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

fix: llmapi-launch add add trtllm-bench test with engine building (#4091

9199793

) * add trtllm-bench mgmn test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

test: NIXL single process test (#4486)

3d62727

Chore: waive torch compile test cases of deepseek v3 lite (#4508)

2372589

waive torch compile test cases of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

Clean: fmha codes (#4496)

6a35c59

clean codes Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite (#4520)

15317ec

waive test_fp8_block_scales_4gpus of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

docs: update the introduction for scaffolding (#4360)

a201ce9

Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>

test: add failed case in waive list and fix some test script issue fo…

83f1933

…r perf test (#4527) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert) (

4018806

#4495) refactor fused_moe for redundant expert Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

chore: clean ucx and nixl mirror. (#4531)

3b12e46

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

ixlmar and others added 29 commits June 4, 2025 11:03

chore: introduce KvCacheCreator (#4581)

2bbb6b5

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

tests: Update gb200 test case (#4754)

1fca654

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

fix: Fix broken vanilla moe since FusedMoE refactor. (#4897)

6b32426

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

fix: LLM invalid arg in a test (#4922)

8e0d96f

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

[AutoDeploy] deprecate CI post-merge tests and keep them for local te…

f9d45e0

…sting (#4892) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

[infra] Unwaive unittests/_torch (#4919)

8433091

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

[TRTLLM-4647][fix] Fix the no fusion allreduce hanging (#4594)

b0d287c

Signed-off-by: Shiyu Li <shili@nvidia.com>

tests: fix 5273697 (#4685)

50a74a1

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

Waive L0 tests (#4927)

9ceef98

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

Only pass fast_build=true to non-pytorch backend (#4920)

ddbaa5e

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

tests: [TRTQA-2906] add benchmark serving tests (#4901)

1c3091c

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>

fix: handle OOMs during KV cache estimation (#4690)

6437756

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

CI: waive test_llm_get_queued_stats (#4945)

91e8d43

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

[AutoDeploy] _AutoDeployLlmArgs as primary config object (#4891)

743fb0a

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Revert "[infra] Unwaive unittests/_torch" (#4950)

d5a8079

Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949

b8c5e38

) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests (#4859)

a152635

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

Add disaggregated unittest (#4899)

3eae58c

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

Waive L0 tests (#4953)

7e921c7

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

fix a bug of global cuda graph dummy request (#4894)

154f7cc

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

Fix: fix autodeploy (#4957)

bfa877a

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>

feat : add PositionEmbeddingType=0 to xqa support (#4934)

51652b9

Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>

update fmha_v2 (#4895)

180b91f

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

infra: update jnlp version in container image (#4944)

d2c311c

doc: expose Large-scale EP design and implementation tech blog in the…

37ac564

… main… (#4960) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

Revert "fix a bug of global cuda graph dummy request" (#4970)

ec50684

doc: refinement based on Julien's feedbacks (#4967)

a761cc2

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

test: [CI] Add failed cases into waives.txt (#4966)

5644721

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

pull bot merged commit 5644721 into LarryXFly:main Jun 6, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from NVIDIA:main#86

[pull] main from NVIDIA:main#86
pull[bot] merged 414 commits intoLarryXFly:mainfrom
NVIDIA:main

pull bot commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

pull bot commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments