[None][feat] add skip condition in AutoDeploy's triton fused moe kernel by suyoggupta · Pull Request #8632 · NVIDIA/TensorRT-LLM

suyoggupta · 2025-10-23T20:32:50Z

Skip compute when kernel has no valid work

Summary by CodeRabbit

Performance Improvements
- Optimized MOE kernel to efficiently skip padded tokens during processing, reducing unnecessary computation and improving overall throughput for sequences with padding.
Bug Fixes
- Improved boundary condition handling to prevent out-of-bounds memory access in padded data processing.
Refactor
- Enhanced kernel parameter passing for better control over padding handling.

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta · 2025-10-23T20:35:49Z

/bot run

coderabbitai · 2025-10-23T20:36:30Z

📝 Walkthrough

Walkthrough

Adds a num_tokens_post_padded_ptr parameter to the Triton MOE kernel to track actual token count. The kernel performs early exit when processing padded tokens, with the parameter propagated through _invoke_kernel and grid calculations to prevent out-of-bounds operations.

Changes

Cohort / File(s)	Summary
Triton MOE Kernel Parameter Addition `tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py`	Added `num_tokens_post_padded_ptr` parameter to `fused_mlp_moe_kernel` function. Kernel loads the value and performs early return when `pid_m * BLOCK_SIZE_M >= num_tokens_post_padded` to skip padded token processing. Updated `_invoke_kernel` to accept and forward `num_tokens_post_padded` argument. Added guard check before token/index computations. Updated kernel call sites in grid calculation path to supply the new parameter.

Sequence Diagram

sequenceDiagram
    participant Caller as Grid/Caller
    participant _invoke_kernel
    participant fused_mlp_moe_kernel
    
    Caller->>_invoke_kernel: _invoke_kernel(..., num_tokens_post_padded)
    _invoke_kernel->>fused_mlp_moe_kernel: launch kernel with num_tokens_post_padded_ptr
    
    rect rgb(200, 220, 240)
        Note over fused_mlp_moe_kernel: Load num_tokens_post_padded
    end
    
    fused_mlp_moe_kernel->>fused_mlp_moe_kernel: Check: pid_m * BLOCK_SIZE_M >= num_tokens_post_padded?
    
    alt Padded token range
        fused_mlp_moe_kernel->>fused_mlp_moe_kernel: Early return (skip processing)
    else Valid token range
        rect rgb(220, 240, 200)
            Note over fused_mlp_moe_kernel: Process token computations
        end
        fused_mlp_moe_kernel->>fused_mlp_moe_kernel: Execute kernel logic
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is significantly incomplete compared to the required template. The author provided only a brief statement "Skip compute when kernel has no valid work," which lacks the essential sections outlined in the template, including a detailed explanation of the issue and solution, a dedicated Test Coverage section listing relevant tests, and a completed PR Checklist. While the brief description hints at the purpose of the changes, it does not provide sufficient context or follow the repository's required structure for pull request documentation.	The author should expand the pull request description to include: a detailed explanation of the problem being solved and why it was necessary (what issue occurs without the skip condition), the solution approach (how num_tokens_post_padded is used to skip padding tokens), a Test Coverage section documenting which tests validate this change, and confirmation of the PR Checklist items. Refer to the repository's description template to ensure all required sections are completed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "[None][feat] add skip condition in AutoDeploy's triton fused moe kernel" clearly and specifically describes the main change in the changeset. The raw summary confirms that the primary modification is adding a skip condition (early return) to the fused_mlp_moe_kernel to avoid processing padding tokens when pid_m * BLOCK_SIZE_M >= num_tokens_post_padded. The title follows the repository's template format with the [None] ticket identifier and [feat] type designation, and it is concise and descriptive without unnecessary noise.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py (1)

67-77: Consider updating the docstring.

The function docstring could be updated to document the new num_tokens_post_padded_ptr parameter and mention the early-exit optimization for padded blocks.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32e1ad6 and b20a340.

📒 Files selected for processing (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Check PR Checklist Resolution
GitHub Check: Pre-commit Check

🔇 Additional comments (3)

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py (3)

44-44: LGTM: Parameter addition is clean.

The new num_tokens_post_padded_ptr parameter is correctly added to track the actual token count after padding.

88-90: LGTM: Early-exit logic is correct and safe.

The early return correctly skips blocks that only contain padded tokens. Loading the scalar value from the pointer is safe (all threads read the same value), and returning before any token processing or memory operations prevents out-of-bounds access.

248-248: LGTM: Parameter propagation is correct.

The num_tokens_post_padded parameter is properly threaded through _invoke_kernel and passed to the kernel. The comment about CUDA graph compatibility helpfully explains why this is a tensor rather than a scalar.

Also applies to: 278-278

tensorrt-cicd · 2025-10-23T20:42:21Z

PR_Github #22321 [ run ] triggered by Bot. Commit: b20a340

tensorrt-cicd · 2025-10-23T22:39:50Z

PR_Github #22321 [ run ] completed with state SUCCESS. Commit: b20a340
/LLM/main/L0_MergeRequest_PR pipeline #16828 completed with status: 'FAILURE'

suyoggupta · 2025-10-23T22:44:00Z

/bot run

tensorrt-cicd · 2025-10-23T22:49:51Z

PR_Github #22325 [ run ] triggered by Bot. Commit: 2597388

tensorrt-cicd · 2025-10-24T01:15:56Z

PR_Github #22325 [ run ] completed with state SUCCESS. Commit: 2597388
/LLM/main/L0_MergeRequest_PR pipeline #16831 completed with status: 'FAILURE'

suyoggupta · 2025-10-24T01:31:39Z

/bot run

tensorrt-cicd · 2025-10-24T01:37:16Z

PR_Github #22342 [ run ] triggered by Bot. Commit: 588eccc

tensorrt-cicd · 2025-10-24T03:58:07Z

PR_Github #22342 [ run ] completed with state SUCCESS. Commit: 588eccc
/LLM/main/L0_MergeRequest_PR pipeline #16844 completed with status: 'FAILURE'

suyoggupta · 2025-10-24T04:53:12Z

/bot run

tensorrt-cicd · 2025-10-24T05:00:48Z

PR_Github #22373 [ run ] triggered by Bot. Commit: 1745541

tensorrt-cicd · 2025-10-24T07:14:44Z

PR_Github #22373 [ run ] completed with state SUCCESS. Commit: 1745541
/LLM/main/L0_MergeRequest_PR pipeline #16864 completed with status: 'FAILURE'

suyoggupta · 2025-10-24T07:22:39Z

/bot run

tensorrt-cicd · 2025-10-24T07:28:51Z

PR_Github #22402 [ run ] triggered by Bot. Commit: ff6c06f

tensorrt-cicd · 2025-10-24T09:50:14Z

PR_Github #22402 [ run ] completed with state SUCCESS. Commit: ff6c06f
/LLM/main/L0_MergeRequest_PR pipeline #16885 completed with status: 'FAILURE'

suyoggupta · 2025-10-24T09:52:58Z

/bot run

tensorrt-cicd · 2025-10-24T09:59:10Z

PR_Github #22423 [ run ] triggered by Bot. Commit: a90fa7b

tensorrt-cicd · 2025-10-24T12:46:15Z

PR_Github #22423 [ run ] completed with state SUCCESS. Commit: a90fa7b
/LLM/main/L0_MergeRequest_PR pipeline #16900 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

add skip condition

b20a340

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta requested a review from a team as a code owner October 23, 2025 20:32

suyoggupta requested a review from MrGeva October 23, 2025 20:32

github-project-automation bot added this to AutoDeploy Board Oct 23, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board Oct 23, 2025

suyoggupta mentioned this pull request Oct 23, 2025

Add skip condition #8623

Closed

1 task

suyoggupta requested a review from nvchenghaoz October 23, 2025 20:34

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

nvchenghaoz approved these changes Oct 23, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In review in AutoDeploy Board Oct 23, 2025

suyoggupta enabled auto-merge (squash) October 23, 2025 20:49

Merge branch 'main' into sg/triton-fix-rebased

2597388

Merge branch 'main' into sg/triton-fix-rebased

588eccc

suyoggupta added 2 commits October 23, 2025 19:10

Merge branch 'main' into sg/triton-fix-rebased

872711b

Merge branch 'main' into sg/triton-fix-rebased

e870741

Merge branch 'main' into sg/triton-fix-rebased

1745541

Merge branch 'main' into sg/triton-fix-rebased

ff6c06f

Merge branch 'main' into sg/triton-fix-rebased

a90fa7b

suyoggupta merged commit f512dda into NVIDIA:main Oct 24, 2025
5 checks passed

github-project-automation bot moved this from In review to Done in AutoDeploy Board Oct 24, 2025

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 1, 2025

[None][feat] add skip condition in AutoDeploy's triton fused moe kern…

cb2300c

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] add skip condition in AutoDeploy's triton fused moe kern…

e186860

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] add skip condition in AutoDeploy's triton fused moe kern…

5805d7d

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] add skip condition in AutoDeploy's triton fused moe kern…

a600060

…el (NVIDIA#8632) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

Conversation

suyoggupta commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

suyoggupta commented Oct 23, 2025

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

suyoggupta commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

suyoggupta commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

suyoggupta commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

suyoggupta commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

suyoggupta commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

suyoggupta commented Oct 23, 2025 •

edited by coderabbitai bot

Loading