[pull] main from NVIDIA:main#86
Merged
pull[bot] merged 414 commits intoLarryXFly:mainfrom Jun 6, 2025
Merged
Conversation
* Replace sanity test for nemotron h with a correctness test * Add prefill+decode reference logprobs from initial implementation + batched forward test * Add testing that decode matches prefill - compare decode vs all prefilling the decoded tokens
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
…4428) * add test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
* Fix TRTLLMSampler. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Added type hint. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* chore: Partition context requests in MicroBatchScheduler Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fixup! chore: Partition context requests in MicroBatchScheduler Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
…r and fix EarlyStopDecoder unsqueeze bug (#4290) * add bidirectional support and fix EarlyStopDecoder unsqueeze to be compatible with LogitsStorage Signed-off-by: Rohan Varma <rohanv@nvidia.com> * run pre-commit Signed-off-by: Rohan Varma <rohanv@nvidia.com> * instead of bidirectional flag use ModelConfig.is_generation Signed-off-by: Rohan Varma <rohanv@nvidia.com> * fix unit test to extract logits from correct dim Signed-off-by: Rohan Varma <rohanv@nvidia.com> --------- Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* waives closed bugs Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
add remaining 2 phi cpp perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
* update cubins Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * add chunked-attention kernels on blackwell Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> fix Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* v1.5 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> v1.5.4 Add back draft_overhead to spec dec stats Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v1.5.5: fix CI error Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.6: fix CI error 8196 > 8192 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * precommit run Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v2.0: Address reviewer concerns Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v2.1: add fix from wili Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Revert changes that require use of TypeAlias because that requires python version >= 3.10 Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> --------- Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
waive torch compile test cases of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* feat: add deepgemm_swapab feat: add fp8_gemm_kernel_swapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> feat: set threshold for deepgemm and deepgemmswapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * docs: update README.md Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * fix: std::runtime_error needs #include <stdexcept> Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * chores: remove the redundant code Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * feat: support for dense deep_gemm swapab Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> * chores: remove redundant code Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> --------- Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
* unwaive some disagg test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * pytest.mark.skip_less_device(4) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
clean codes Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add llama 3.3 70b 2 nodes tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * remove enable_overlap_scheduler parameter Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
waive test_fp8_block_scales_4gpus of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
…_chunked prefill to be true for isl>2048 cases (#4285) 1.remove enable_overlap_schedule in pytorch config 2.rename model_yaml_config.py to pytorch_model_config.py and set enable_chunked_prefill to be true for cases with isl>2048 Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
…r perf test (#4527) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
#4478) * update sanity test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
#4495) refactor fused_moe for redundant expert Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* chore: Improve formatting of DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Typed InstanceRole param in DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Skip DisaggExecutorTest based on device count Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
…sting (#4892) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Shiyu Li <shili@nvidia.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
…Implementation of Large-scale EP) (#4958) Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
… main… (#4960) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )