Add tests for fusedmoe 2694938553 by shaunkotek · Pull Request #2566 · flashinfer-ai/flashinfer

shaunkotek · 2026-02-15T09:19:36Z

📌 Description

Add more tests cases to Cutlass fusedMoE kernel to match Nemotron usescases, in order to make sure it will not break.

🚀 Pull Request Checklist

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

For larger cases, i got a lot of rounding error in the quantized tests, so i scaled all the numbers down even further (was by 10, changed to by 100). I Checked and this does not create degenerate matrixes, but if you have any suggestions on how to do it better im open.

coderabbitai · 2026-02-15T09:19:44Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-15T09:19:57Z

Summary of Changes

Hello @shaunkotek, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the fusedMoE kernel by expanding its test suite. The changes introduce new test configurations and activation types, specifically Relu2, to better align with Nemotron use cases. Furthermore, it addresses numerical precision issues in quantized tests by fine-tuning scaling factors, ensuring the kernel's stability and accuracy under various operational conditions.

Highlights

Expanded Test Coverage: Added comprehensive test cases for the fusedMoE kernel, specifically incorporating Relu2 activation type support across various quantization and parallelism scenarios.
Test Configuration Refactoring: Introduced a namedtuple (MoeConfig) to streamline and organize test parameters, making test definitions cleaner and more scalable for different MoE configurations.
Quantization Precision Improvements: Adjusted scaling factors in FP8 and NVFP4 quantized tests to mitigate rounding errors, enhancing the numerical stability and accuracy of the tests.
Device Placement Consistency: Ensured all newly created tensors within the test suite are explicitly placed on the CUDA device, promoting consistent and correct GPU utilization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tests/moe/test_trtllm_cutlass_fused_moe.py
- Introduced namedtuple for structured MoE test configurations.
- Refactored test parameterization to use MoeConfig objects, consolidating batch_size, hidden_size, num_experts, top_k, and intermediate_size.
- Expanded test coverage to include Relu2 activation type across various MoE tests (FP8, NVFP4, expert parallel, tensor parallel, and combined parallel).
- Modified the compute_with_experts reference function to support different activation_type and fc2_quant parameters.
- Adjusted scaling factors for weights in FP8 and NVFP4 quantization tests to improve numerical stability and reduce rounding errors.
- Ensured explicit CUDA device placement for newly created tensors in test setups.
- Updated parallelism tests (expert_parallel, tensor_parallel, tensor_expert_parallel) to correctly handle Relu2 activation type and use the new configuration structure.
- Added conditional logic for TP_SIZES to include 8 if enough CUDA devices are available.

Activity

The author has completed all pre-commit checks.
All existing and new tests are passing.
The author identified and addressed rounding errors in larger quantized tests by scaling down numbers, confirming this does not create degenerate matrices.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request significantly improves the test coverage for the fusedmoe kernel by adding support for various configurations and activation types (SwiGLU and ReLU2). It also addresses numerical stability issues in quantized tests by adjusting the scaling factors. My feedback focuses on improving the efficiency and correctness of quantization scale calculations in the tests, as well as removing redundant operations and commented-out code.

gemini-code-assist · 2026-02-15T09:21:39Z

tests/moe/test_trtllm_cutlass_fused_moe.py

+    else:
+        w1_scales = w31_scales
+    x_quant, hidden_states_scale = dynamic_per_tensor_fp8_quant(x)
+    hidden_states_scale = torch.tensor(hidden_states_scale[0], device="cuda")


The hidden_states_scale tensor returned by dynamic_per_tensor_fp8_quant is already on the same device as the input x (which is CUDA). Re-creating it with torch.tensor and explicitly moving it to CUDA is redundant and slightly less efficient.

Suggested change

hidden_states_scale = torch.tensor(hidden_states_scale[0], device="cuda")

hidden_states_scale = hidden_states_scale[0]

gemini-code-assist · 2026-02-15T09:21:40Z

tests/moe/test_trtllm_cutlass_fused_moe.py

    a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to(
        torch.float32
    ).cuda()
-    a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32)
+    # a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32)


The .cuda() call is redundant here as x is already on the CUDA device. Additionally, the commented-out code should be removed to keep the test file clean.

Suggested change

a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to(

torch.float32

).cuda()

a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32)

# a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32)

a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to(

torch.float32

)

shaunkotek added 10 commits February 11, 2026 16:12

Add non gated cutlass moe fp8 tests

b7aa565

Add non gated cutlass moe ep tests

22d1c7a

Add non gated cutlass moe tp tests

725bdcf

Add cutlass moe tp 8 test

f419626

Add non gated cutlass moe tp-ep tests

f1edaea

add larger parameters to cutlass moe tests

95737d8

debug

51280f9

do not move tensors to cuda, create them there to begin with

e3fe8ed

fix unecessary chunking

33065a4

try adding clamping

32be9f6

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

rescale all nubmers to fit larger dims

2460343

shaunkotek force-pushed the add-tests-for-fusedmoe-2694938553 branch from 48a9c59 to 2460343 Compare February 15, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for fusedmoe 2694938553#2566

Add tests for fusedmoe 2694938553#2566
shaunkotek wants to merge 11 commits intoflashinfer-ai:mainfrom
shaunkotek:add-tests-for-fusedmoe-2694938553

shaunkotek commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 15, 2026

Uh oh!

gemini-code-assist bot Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	hidden_states_scale = torch.tensor(hidden_states_scale[0], device="cuda")
	hidden_states_scale = hidden_states_scale[0]

Conversation

shaunkotek commented Feb 15, 2026

📌 Description

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

coderabbitai bot commented Feb 15, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant