Skip to content

[None][infra] Waive failed multinode tests in GB200 stage#12553

Open
yuanjingx87 wants to merge 1 commit intoNVIDIA:mainfrom
yuanjingx87:user/yuanjingx/waive_failed_gb200_test
Open

[None][infra] Waive failed multinode tests in GB200 stage#12553
yuanjingx87 wants to merge 1 commit intoNVIDIA:mainfrom
yuanjingx87:user/yuanjingx/waive_failed_gb200_test

Conversation

@yuanjingx87
Copy link
Collaborator

@yuanjingx87 yuanjingx87 commented Mar 26, 2026

Summary by CodeRabbit

  • Tests
    • Marked accuracy test as skipped for specific multi-GPU configuration due to a known issue under investigation.

Description

Waive failed GB200 test

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
@yuanjingx87
Copy link
Collaborator Author

/bot run --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-1,GB200-8_GPUs-2_Nodes-PyTorch-2"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

Adds a single test waiver entry to the integration test waives list, marking a Deep Seek R1 test case with throughput metrics as skipped due to a bug reference.

Changes

Cohort / File(s) Summary
Test Waiver Configuration
tests/integration/test_lists/waives.txt
Added one waived test entry for accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_nvfp4_multi_gpus[throughput_mtp] marked as SKIP with bug reference https://nvbugs/6021482.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description is minimal and lacks required sections. Title is missing proper formatting with ticket reference and type, and key sections like Description and Test Coverage are incomplete or empty. Add properly formatted title with ticket reference [https://nvbugs/6021482][infra], provide detailed explanation of why the test is being waived, and document test coverage or affected tests.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: waiving failed multinode tests in the GB200 stage, which matches the changeset's addition of a waived test entry.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40413 [ run ] triggered by Bot. Commit: 53ce35f Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40413 [ run ] completed with state SUCCESS. Commit: 53ce35f
/LLM/main/L0_MergeRequest_PR pipeline #31506 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants