[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests by shuyixiong · Pull Request #9939 · NVIDIA/TensorRT-LLM

shuyixiong · 2025-12-12T04:56:58Z

Summary by CodeRabbit

Examples
- Added a new example script to emulate the performance of running TensorRT-LLM with Ray orchestrator for Reinforcement Learning (RL) workloads. It creates multiple AsyncLLM instances distributed across GPUs using Ray placement groups, enabling parallel generation for RL training scenarios.
Tests
- Added a test to guards against port conflict failures when launching multiple TensorRT-LLM instances concurrently.
- Isolate Ray cluster state between tests to prevent cross-stage interference. Ensure the Ray cluster is created locally and destroyed within the scope of each individual test. For scenarios requiring Ray cluster attachment, enforce the use of RAY_ADDRESS to ensure connection to the specified cluster, thereby eliminating interference between Ray stages caused by shared Ray status.
CI
- Added a missing ray stage with 4 H100 gpus to L0_test which is omitted in PR9353.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-12-12T05:01:24Z

📝 Walkthrough

Walkthrough

A new Python script implements a Ray-based distributed inference workflow for TensorRT LLMs, featuring a remote worker class that manages async LLM initialization and token generation, alongside orchestration logic for resource validation, placement group configuration, prompt distribution in round-robin fashion, and async result collection with throughput reporting.

Changes

Cohort / File(s)	Summary
Ray-based LLM orchestration script `examples/ray_orchestrator/rl_perf_repro.py`	New file introducing `trtllm_instance` remote worker class with async LLM initialization and token generation; `setup_rl_llm()` function orchestrating GPU validation, Ray placement group creation, actor instantiation, and async prompt distribution; `add_rl_llm_args()` and `parse_arguments()` for CLI option handling; `main()` entry point coordinating argument parsing and async execution.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Parser
    participant Main as main()
    participant Setup as setup_rl_llm()
    participant Validation as Resource Validator
    participant Ray as Ray Environment
    participant PG as Placement Group
    participant Actors as LLM Instances
    participant AsyncCollector as Async Collector

    CLI->>Main: parse_arguments()
    Main->>Setup: setup_rl_llm(args)
    
    Setup->>Validation: Validate GPU availability
    Validation-->>Setup: Validation result
    
    alt Validation Success
        Setup->>Ray: Configure Ray environment
        Ray-->>Setup: Ray initialized
        
        Setup->>PG: Create STRICT_PACK placement group
        PG-->>Setup: Placement group ready
        
        Setup->>Actors: Create num_instances Ray actors (trtllm_instance)
        Actors->>Actors: Await actor readiness
        Actors-->>Setup: All actors ready
        
        Setup->>Actors: init_llm() on each actor
        Actors->>Actors: Initialize AsyncLLM, build SamplingParams
        Actors-->>Setup: LLMs initialized
        
        Setup->>AsyncCollector: Distribute prompts round-robin to actors
        AsyncCollector->>Actors: generate(prompt)
        Actors-->>AsyncCollector: token_ids, logprobs
        AsyncCollector-->>Setup: All results collected
        
        Setup->>Setup: Report timing & throughput
    end
    
    Setup->>PG: Clean up placement groups
    Setup->>Ray: Shutdown Ray
    Setup-->>Main: Execution complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Ray and placement group configuration: Verify STRICT_PACK strategy is appropriate and that placement group creation/cleanup is robust.
GPU validation logic: Ensure GPU availability checks correctly account for tensor parallelism and requested instance counts.
Async orchestration and round-robin distribution: Confirm prompt distribution pattern is fair and result collection handles edge cases (failures, timeouts).
Actor initialization and lifecycle: Validate that trtllm_instance async initialization sequence and async LLM setup are properly awaited and error-handled.
Resource cleanup in finally blocks: Ensure placement groups and Ray shutdown occur reliably to prevent resource leaks.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	⚠️ Warning	PR description is blank/unfilled except for the template boilerplate, with no actual implementation details, rationale, or test coverage provided.	Fill in the Description section explaining what the RLPerf reproduce script does and why it's needed. Add Test Coverage details explaining how the new functionality is validated. Ensure the PR description contains substantive content beyond the template.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: adding an RL performance reproduction script using Ray, which directly aligns with the changeset that introduces the new rl_perf_repro.py file with Ray orchestration.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (6)

examples/ray_orchestrator/rl_perf_repro.py (6)
21-27: Add class and method docstrings.

For Python interfaces that may be used outside a file, the coding guidelines prefer docstrings over comments. Consider adding Google-style docstrings to document the class and its parameters.

Example:
@ray.remote
class TRTLLMInstance:
    """Ray remote worker that manages AsyncLLM initialization and generation.
    
    Attributes:
        async_llm_kwargs: Configuration dict for AsyncLLM initialization.
        sampling_kwargs: Configuration dict for SamplingParams.
        llm: The AsyncLLM instance (set after init_llm is called).
        sampling_params: The SamplingParams instance (set after init_llm is called).
    """
    
    def __init__(self, async_llm_kwargs: dict, sampling_kwargs: dict):
        """Initialize the TRTLLMInstance with configuration dictionaries.
        
        Args:
            async_llm_kwargs: Configuration parameters for AsyncLLM.
            sampling_kwargs: Configuration parameters for SamplingParams.
        """
42-48: Consider making hardcoded AsyncLLM parameters configurable.

Several AsyncLLM parameters are hardcoded (enable_sleep=True, batch_wait_timeout_iters=32, batch_wait_max_tokens_ratio=0.5), which may limit flexibility for different performance reproduction scenarios.

70-70: Simplify logprobs extraction.

The single-element slice can be replaced with next(iter(...)) for clarity.

Apply this diff:
-            log_probs = [list(d.values())[0].logprob for d in outputs.outputs[0].logprobs]
+            log_probs = [next(iter(d.values())).logprob for d in outputs.outputs[0].logprobs]
74-78: Add function docstring.

This is the main orchestration function and should have a comprehensive Google-style docstring documenting its purpose, parameters, and behavior.

84-87: Hardcoded single-node GPU limit.

The script enforces a hardcoded limit of 8 GPUs for single-node operation. Consider making this configurable via CLI argument or documenting this limitation more prominently.

268-270: Add main function docstring.

Consider adding a docstring to document the entry point's purpose.
def main():
    """Entry point for RL performance reproduction script."""
    args = parse_arguments()
    asyncio.run(setup_rl_llm(args))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 110820b and 98e600f.

📒 Files selected for processing (1)

examples/ray_orchestrator/rl_perf_repro.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

examples/ray_orchestrator/rl_perf_repro.py

**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

examples/ray_orchestrator/rl_perf_repro.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7520
File: tensorrt_llm/_torch/pyexecutor/resource_manager.py:605-613
Timestamp: 2025-09-24T03:31:28.908Z
Learning: In TensorRT-LLM Ray orchestrator mode, ProcessGroups are initialized with both Gloo and NCCL backends (e.g., "cuda:nccl,cpu:gloo"), allowing PyTorch distributed to automatically route CPU tensors through Gloo and GPU tensors through NCCL. This eliminates the need for manual device placement when performing allreduce operations on base types.

Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.

Learnt from: achartier
Repo: NVIDIA/TensorRT-LLM PR: 6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.

Learnt from: yibinl-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:42-49
Timestamp: 2025-09-23T14:58:05.372Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/), the token partitioning intentionally uses ceil-like distribution (same token_per_rank for all ranks) to ensure all ranks launch the same number of blocks. This is required for optimal NCCL device API barrier performance, even though it may launch extra blocks for non-existent tokens on later ranks. Runtime bounds checking in the kernel (blockID validation) handles the overshoot cases.

Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/thop/allreduceOp.cpp:352-446
Timestamp: 2025-09-23T15:12:38.312Z
Learning: In TensorRT-LLM NCCL device implementation, NCCL version 2.28+ requirements are handled at runtime in the nccl_device/config layer rather than with compile-time guards. This allows the allreduceOp to remain version-agnostic and delegates version compatibility validation to the appropriate lower-level components that can gracefully handle unsupported configurations.

Learnt from: dcampora
Repo: NVIDIA/TensorRT-LLM PR: 6867
File: tensorrt_llm/_torch/pyexecutor/sampler.py:67-72
Timestamp: 2025-08-13T16:20:37.987Z
Learning: In TensorRT-LLM sampler code, performance is prioritized over additional validation checks. The beam_width helper method intentionally returns the first request's beam_width without validating consistency across all requests to avoid performance overhead from iterating through the entire batch.

Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/ray_orchestrator/rl_perf_repro.py

📚 Learning: 2025-09-24T03:31:28.908Z

Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7520
File: tensorrt_llm/_torch/pyexecutor/resource_manager.py:605-613
Timestamp: 2025-09-24T03:31:28.908Z
Learning: In TensorRT-LLM Ray orchestrator mode, ProcessGroups are initialized with both Gloo and NCCL backends (e.g., "cuda:nccl,cpu:gloo"), allowing PyTorch distributed to automatically route CPU tensors through Gloo and GPU tensors through NCCL. This eliminates the need for manual device placement when performing allreduce operations on base types.

Applied to files:

examples/ray_orchestrator/rl_perf_repro.py

📚 Learning: 2025-11-27T09:23:18.742Z

Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 9511
File: tests/integration/defs/examples/serve/test_serve.py:136-186
Timestamp: 2025-11-27T09:23:18.742Z
Learning: In TensorRT-LLM testing, when adding test cases based on RCCA commands, the command format should be copied exactly as it appears in the RCCA case, even if it differs from existing tests. For example, some RCCA commands for trtllm-serve may omit the "serve" subcommand while others include it.

Applied to files:

examples/ray_orchestrator/rl_perf_repro.py

🧬 Code graph analysis (1)

examples/ray_orchestrator/rl_perf_repro.py (3)

tensorrt_llm/_torch/async_llm.py (2)

AsyncLLM (7-106)

setup_async (40-46)

tensorrt_llm/llmapi/llm_args.py (1)

CudaGraphConfig (107-164)

tensorrt_llm/llmapi/llm.py (1)

prompt (87-88)

🪛 Ruff (0.14.8)

examples/ray_orchestrator/rl_perf_repro.py

70-70: Prefer next(iter(d.values())) over single element slice

Replace with next(iter(d.values()))

(RUF015)

85-87: Avoid specifying long messages outside the exception class

(TRY003)

89-91: Avoid specifying long messages outside the exception class

(TRY003)

185-185: Local variable results is assigned to but never used

Remove assignment to unused variable results

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

examples/ray_orchestrator/rl_perf_repro.py (4)
64-64: Verify Python version compatibility for type annotation syntax.

The type annotation list[int] uses Python 3.9+ syntax, but the coding guidelines state "The code developed for TensorRT-LLM should conform to Python 3.8+". For Python 3.8 compatibility, use List[int] from the typing module instead.

However, based on learnings, TensorRT-LLM may actually require Python 3.10+ in practice. Please verify the project's actual Python version requirement.

If Python 3.8 compatibility is required, apply this diff:
+from typing import List
+
 ...
-    async def generate(self, prompt: list[int]):
+    async def generate(self, prompt: List[int]):
135-135: Clarify conditional max_batch_size logic.

The logic "max_batch_size": 0 if args.batch_sizes else args.max_batch_size sets max_batch_size to 0 when custom batch_sizes are provided. Please verify this is the intended behavior for CudaGraphConfig, as setting it to 0 seems unusual.

Based on the relevant code snippets, CudaGraphConfig has a validate_cuda_graph_max_batch_size method that "ensures max_batch_size is non-negative." Setting it to 0 might bypass CUDA graph generation or use a default behavior. Please confirm this aligns with the intended configuration.

155-155: Hardcoded end_id may cause issues.

The end_id is hardcoded to -1, which typically means "no end token". Verify this is appropriate for all models, as some models may require a valid end token ID for proper generation termination.

127-127: Worker extension class exists and is appropriate.

The WorkerExtension class is properly defined in tensorrt_llm/llmapi/rlhf_utils.py and is designed specifically for this purpose with methods for updating weights and checking weight updates—both essential for RLHF workflows.

tests/integration/defs/ray_orchestrator/RL/run_rl_perf_reproduce.py

examples/ray_orchestrator/rl_perf_repro.py

tests/integration/defs/ray_orchestrator/RL/run_rl_perf_reproduce.py

examples/ray_orchestrator/rl_perf_repro.py

shuyixiong · 2025-12-17T12:17:46Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

tensorrt-cicd · 2025-12-17T12:23:27Z

PR_Github #28764 [ run ] triggered by Bot. Commit: 53e8c12

tensorrt-cicd · 2025-12-17T12:55:59Z

PR_Github #28764 [ run ] completed with state FAILURE. Commit: 53e8c12
/LLM/main/L0_MergeRequest_PR pipeline #22011 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

shuyixiong · 2025-12-17T13:15:26Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

tensorrt-cicd · 2025-12-17T13:20:50Z

PR_Github #28774 [ run ] triggered by Bot. Commit: ece848b

tensorrt-cicd · 2025-12-17T13:59:27Z

PR_Github #28774 [ run ] completed with state FAILURE. Commit: ece848b
/LLM/main/L0_MergeRequest_PR pipeline #22021 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

shuyixiong · 2025-12-17T15:10:36Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

tensorrt-cicd · 2025-12-17T15:16:45Z

PR_Github #28795 [ run ] triggered by Bot. Commit: 7324d4f

tburt-nv

Approving the pipeline change, although it looks like the tests are failing and may need more work.

tensorrt-cicd · 2025-12-17T17:16:07Z

PR_Github #28795 [ run ] completed with state SUCCESS. Commit: 7324d4f
/LLM/main/L0_MergeRequest_PR pipeline #22042 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

shuyixiong · 2025-12-23T04:58:59Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

shuyixiong · 2025-12-23T04:59:59Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1" --disable-reuse-test

tensorrt-cicd · 2025-12-23T05:06:02Z

PR_Github #29517 [ run ] triggered by Bot. Commit: 1d6ffe2

tensorrt-cicd · 2025-12-23T05:07:19Z

PR_Github #29518 [ run ] triggered by Bot. Commit: 1d6ffe2

tensorrt-cicd · 2025-12-23T06:48:24Z

PR_Github #29518 [ run ] completed with state SUCCESS. Commit: 1d6ffe2
/LLM/main/L0_MergeRequest_PR pipeline #22694 (Partly Tested) completed with status: 'SUCCESS'

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

shuyixiong · 2025-12-23T07:02:30Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

tensorrt-cicd · 2025-12-23T07:08:34Z

PR_Github #29543 [ run ] triggered by Bot. Commit: 7abfddd

tensorrt-cicd · 2025-12-23T07:32:25Z

PR_Github #29543 [ run ] completed with state FAILURE. Commit: 7abfddd
/LLM/main/L0_MergeRequest_PR pipeline #22717 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tests/integration/defs/examples/test_ray.py

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

shuyixiong · 2025-12-23T08:36:26Z

/bot run --stage-list "DGX_H100-2_GPUs-PyTorch-Ray-1,DGX_H100-4_GPUs-PyTorch-Ray-1"

tensorrt-cicd · 2025-12-23T08:45:51Z

PR_Github #29570 [ run ] triggered by Bot. Commit: 91674f2

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

tensorrt-cicd · 2025-12-23T10:53:06Z

PR_Github #29570 [ run ] completed with state SUCCESS. Commit: 91674f2
/LLM/main/L0_MergeRequest_PR pipeline #22740 (Partly Tested) completed with status: 'SUCCESS'

shuyixiong · 2025-12-23T10:58:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-23T11:15:02Z

PR_Github #29605 [ run ] triggered by Bot. Commit: 97e4e74

Superjomn

LGTM

tensorrt-cicd · 2025-12-23T19:05:02Z

PR_Github #29605 [ run ] completed with state FAILURE. Commit: 97e4e74
/LLM/main/L0_MergeRequest_PR pipeline #22773 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

shuyixiong · 2025-12-24T04:00:47Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-24T04:07:13Z

PR_Github #29721 [ run ] triggered by Bot. Commit: 97e4e74

tensorrt-cicd · 2025-12-24T06:49:38Z

PR_Github #29721 [ run ] completed with state SUCCESS. Commit: 97e4e74
/LLM/main/L0_MergeRequest_PR pipeline #22834 completed with status: 'SUCCESS'

…ustness of Ray tests (NVIDIA#9939) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

…ustness of Ray tests (NVIDIA#9939) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>

…ustness of Ray tests (NVIDIA#9939) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

…ustness of Ray tests (NVIDIA#9939) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>

shuyixiong requested a review from a team as a code owner December 12, 2025 04:56

shuyixiong requested review from Shixiaowei02 and kaiyux December 12, 2025 04:57

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

shuyixiong requested review from Superjomn, hchings and joyang-nv December 12, 2025 05:12

shuyixiong force-pushed the user/shuyix/rl_perf_repro branch from 98e600f to fdd2dcb Compare December 16, 2025 09:49

shuyixiong requested a review from a team as a code owner December 16, 2025 09:49

shuyixiong requested review from a team as code owners December 17, 2025 02:47

shuyixiong requested a review from tburt-nv December 17, 2025 02:47

shuyixiong mentioned this pull request Dec 17, 2025

[None][fix] Add ray stage with 4 h100 gpus to CI and fix sampled logprobs in TRTLLM sampler #9970

Closed

1 task

shuyixiong self-assigned this Dec 17, 2025

Superjomn reviewed Dec 17, 2025

View reviewed changes

examples/ray_orchestrator/rl_perf_repro.py Outdated Show resolved Hide resolved

shuyixiong force-pushed the user/shuyix/rl_perf_repro branch from 575877c to 53e8c12 Compare December 17, 2025 12:16

shuyixiong removed request for a team, Shixiaowei02 and kaiyux December 17, 2025 13:16

tburt-nv approved these changes Dec 17, 2025

View reviewed changes

Independently initialize and tear down the Ray cluster in tests

7abfddd

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

tongyuantongyu reviewed Dec 23, 2025

View reviewed changes

tests/integration/defs/examples/test_ray.py Outdated Show resolved Hide resolved

Use popen in trt_test_alternative for automatic process tree cleanup

91674f2

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

shuyixiong requested a review from tongyuantongyu December 23, 2025 08:24

Apply changes to more ray tests

97e4e74

Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>

shuyixiong changed the title ~~[TRTLLM-9737][chore] Add rl perf reproduce script~~ [TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests Dec 23, 2025

Superjomn approved these changes Dec 23, 2025

View reviewed changes

shuyixiong merged commit f4f0fe8 into NVIDIA:main Dec 24, 2025
7 checks passed

shuyixiong mentioned this pull request Dec 25, 2025

[None][chore] Force local cluster in integration test and skip all disagg ray tests #10198

Closed

1 task

Conversation

shuyixiong commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuyixiong commented Dec 17, 2025

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

shuyixiong commented Dec 17, 2025

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

shuyixiong commented Dec 17, 2025

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

tburt-nv left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Dec 17, 2025

Uh oh!

shuyixiong commented Dec 23, 2025

Uh oh!

shuyixiong commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

shuyixiong commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

Uh oh!

shuyixiong commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

shuyixiong commented Dec 23, 2025

Uh oh!

tensorrt-cicd commented Dec 23, 2025

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

shuyixiong commented Dec 12, 2025 •

edited

Loading

coderabbitai bot commented Dec 12, 2025 •

edited

Loading