-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[None][chroe] Mass integration of release/1.2 - 4th #11500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ea938ad to
3e0daf0
Compare
📝 WalkthroughWalkthroughThis PR introduces speculative decoding logic simplification by removing MLA parameter dependency, adds GPU architecture-aware MOE backend selection in stress tests, updates documentation for NUMA-aware CPU affinity configuration and hardware support, modifies Qwen3VL model placeholder configuration, and adjusts various test configurations and timeouts. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/defs/stress_test/stress_test.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
The copyright header reads
2022-2024, but this file is being meaningfully modified in 2026. Per coding guidelines, the year of the latest meaningful modification should be reflected.-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification."
🤖 Fix all issues with AI agents
In `@docs/source/supported-hardware.md`:
- Around line 3-7: The entry under "NVIDIA Blackwell" mixes a system name with
GPU model names; update the NVIDIA Blackwell bullet so GPU models remain listed
(B200, GB200, B300, GB300) and remove "DGX Spark" from that list, then either
add a new separate bullet or parenthetical note clarifying that "DGX Spark" is a
system/platform that houses Blackwell GPUs (e.g., add a separate line "Systems:
DGX Spark (uses Blackwell GPUs)") to keep GPU-model entries consistent; modify
the text around the "NVIDIA Blackwell" bullet and the new separate line
accordingly.
In `@tests/integration/defs/accuracy/references/mmlu.yaml`:
- Around line 251-254: Remove the duplicated YAML mapping key by keeping a
single accuracy entry and deleting the redundant one; in the shown mapping
(keys: quant_algo, kv_cache_quant_algo, accuracy) ensure only one accuracy: 85.5
remains so the YAML mapping contains unique keys (refer to the accuracy key next
to quant_algo and kv_cache_quant_algo).
🧹 Nitpick comments (3)
tensorrt_llm/_torch/models/modeling_qwen3vl_moe.py (1)
1-1: Missing NVIDIA copyright header.This file lacks the required NVIDIA copyright header. While this appears to be a pre-existing issue (not introduced by this PR), since the file is being modified, consider adding the header. As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification."
tests/integration/defs/stress_test/stress_test.py (2)
527-528: Awkward line break splitsconfig.tp_sizeacross two lines.While valid Python (the expression is inside parentheses), splitting
config.tp_sizeasconfig.\n tp_size,harms readability and could be misread as two separate items.✏️ Suggested formatting fix
- ep_size=config. - tp_size, # ep_size matches tp_size for DeepSeek models + ep_size=config.tp_size, # ep_size matches tp_size for DeepSeek models
599-626: GPU architecture detection logic is sound.The MOE backend selection correctly handles Blackwell (DEEPGEMM/CUTEDSL) and implicitly defaults to CUTLASS for Hopper by not setting
moe_config. The fallback on failure is safe.One minor refinement: the broad
except Exception(flagged by Ruff BLE001) could be narrowed to(ImportError, RuntimeError)to specifically catch the expected failure modes (torch not installed, CUDA unavailable), while still keeping the safe fallback.✏️ Optional: narrow the exception scope
- except Exception as e: + except (ImportError, RuntimeError) as e:
3e0daf0 to
ffdc6d0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35869 [ run ] triggered by Bot. Commit: |
ffdc6d0 to
5afecec
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35879 [ run ] triggered by Bot. Commit: |
…10880) (NVIDIA#11056) Signed-off-by: qqiao <[email protected]> Signed-off-by: Emma Qiao <[email protected]> Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: linquanh <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Patrice Castonguay <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…VIDIA#11066) Signed-off-by: ziyixiong-nv <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…c with graceful fallbacks (NVIDIA#11042) (NVIDIA#11090) Signed-off-by: Ludwig Schneider <[email protected]> Co-authored-by: Ludwig Schneider <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…NVIDIA#10678) Signed-off-by: Dan Hansen <[email protected]> Signed-off-by: dhansen-nvidia <[email protected]> Co-authored-by: Dan Hansen <[email protected]> Co-authored-by: Kaiyu Xie <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
… nvfp4 checkpoint in stress test (NVIDIA#10920) Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Xin He (SW-GPU) <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…11182) Signed-off-by: qqiao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…VIDIA#11134) Signed-off-by: yechank <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…ngFace downloads in `with_mocked_hf_download` (NVIDIA#11201) Signed-off-by: Anish Shanbhag Signed-off-by: Wangshanshan <[email protected]>
…requirements (NVIDIA#10996) Signed-off-by: Pengbo Wang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…IA#11270) Signed-off-by: Stefan Niebler <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…A#11214) Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: yingguo-trt <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
5afecec to
46980e6
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35889 [ run ] triggered by Bot. Commit: |
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores
Description
This is weekly Mass Integration (MI) for release/1.2. Follow PR will not cherry back to main:
#11099 (duplicate with #11100)
#11188 (duplicate with #10471)
and seven infra PR with title: [None][infra] Check in most recent lock file from nightly pipeline
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.