Skip to content

[pull] main from NVIDIA:main#86

Merged
pull[bot] merged 414 commits intoLarryXFly:mainfrom
NVIDIA:main
Jun 6, 2025
Merged

[pull] main from NVIDIA:main#86
pull[bot] merged 414 commits intoLarryXFly:mainfrom
NVIDIA:main

Conversation

@pull
Copy link

@pull pull bot commented May 12, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

tomeras91 and others added 30 commits May 20, 2025 17:55
* Replace sanity test for nemotron h with a correctness test

* Add prefill+decode reference logprobs from initial implementation + batched forward test

* Add testing that decode matches prefill - compare decode vs all prefilling the decoded tokens
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
…4428)

* add test

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
* Fix TRTLLMSampler.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Added type hint.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* chore: Partition context requests in MicroBatchScheduler

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! chore: Partition context requests in MicroBatchScheduler

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
…r and fix EarlyStopDecoder unsqueeze bug (#4290)

* add bidirectional support and fix EarlyStopDecoder unsqueeze to be compatible with LogitsStorage

Signed-off-by: Rohan Varma <rohanv@nvidia.com>

* run pre-commit

Signed-off-by: Rohan Varma <rohanv@nvidia.com>

* instead of bidirectional flag use ModelConfig.is_generation

Signed-off-by: Rohan Varma <rohanv@nvidia.com>

* fix unit test to extract logits from correct dim

Signed-off-by: Rohan Varma <rohanv@nvidia.com>

---------

Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* waives closed bugs

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
add remaining 2 phi cpp perf tests

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add chunked-attention kernels on blackwell

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

fix

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
)

* add trtllm-bench mgmn test

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* v1.5

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

v1.5.4 Add back draft_overhead to spec dec stats

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>

* v1.5.5: fix CI error

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.6: fix CI error 8196 > 8192

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* Address reviewer concerns

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>

* Address reviewer concerns

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>

* precommit run

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>

* v2.0: Address reviewer concerns

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v2.1: add fix from wili

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* Revert changes that require use of TypeAlias because that requires python version >= 3.10

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>

---------

Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
waive torch compile test cases of deepseek v3 lite

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* feat: add deepgemm_swapab

feat: add fp8_gemm_kernel_swapab

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

feat: set threshold for deepgemm and deepgemmswapab

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

* docs: update README.md

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

* fix: std::runtime_error needs #include <stdexcept>

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

* chores: remove the redundant code

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

* feat: support for dense deep_gemm swapab

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

* chores: remove redundant code

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>

---------

Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
* unwaive some disagg test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* pytest.mark.skip_less_device(4)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
clean codes

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add llama 3.3 70b 2 nodes tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* remove enable_overlap_scheduler parameter

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
waive test_fp8_block_scales_4gpus of deepseek v3 lite

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
…_chunked prefill to be true for isl>2048 cases (#4285)

1.remove enable_overlap_schedule in pytorch config
2.rename model_yaml_config.py to pytorch_model_config.py and set enable_chunked_prefill to be true for cases with isl>2048

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
…r perf test (#4527)

add failed case in waive list and fix some test script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
#4478)

* update sanity test list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update test list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
#4495)

refactor fused_moe for redundant expert

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* chore: Improve formatting of DisaggExecutorTest

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Typed InstanceRole param in DisaggExecutorTest

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Skip DisaggExecutorTest based on device count

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
ixlmar and others added 29 commits June 4, 2025 11:03
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
…sting (#4892)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Shiyu Li <shili@nvidia.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
)

Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
…Implementation of Large-scale EP) (#4958)

Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
… main… (#4960)

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
@pull pull bot merged commit 5644721 into LarryXFly:main Jun 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments