Add tests for fusedmoe 2694938553#2566
Add tests for fusedmoe 2694938553#2566shaunkotek wants to merge 11 commits intoflashinfer-ai:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @shaunkotek, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request significantly improves the test coverage for the fusedmoe kernel by adding support for various configurations and activation types (SwiGLU and ReLU2). It also addresses numerical stability issues in quantized tests by adjusting the scaling factors. My feedback focuses on improving the efficiency and correctness of quantization scale calculations in the tests, as well as removing redundant operations and commented-out code.
| else: | ||
| w1_scales = w31_scales | ||
| x_quant, hidden_states_scale = dynamic_per_tensor_fp8_quant(x) | ||
| hidden_states_scale = torch.tensor(hidden_states_scale[0], device="cuda") |
There was a problem hiding this comment.
The hidden_states_scale tensor returned by dynamic_per_tensor_fp8_quant is already on the same device as the input x (which is CUDA). Re-creating it with torch.tensor and explicitly moving it to CUDA is redundant and slightly less efficient.
| hidden_states_scale = torch.tensor(hidden_states_scale[0], device="cuda") | |
| hidden_states_scale = hidden_states_scale[0] |
| a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to( | ||
| torch.float32 | ||
| ).cuda() | ||
| a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32) | ||
| # a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32) |
There was a problem hiding this comment.
The .cuda() call is redundant here as x is already on the CUDA device. Additionally, the commented-out code should be removed to keep the test file clean.
| a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to( | |
| torch.float32 | |
| ).cuda() | |
| a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32) | |
| # a1_gs = torch.tensor(1.0, device="cuda", dtype=torch.float32) | |
| a1_gs = (FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / torch.abs(x).max().to( | |
| torch.float32 | |
| ) |
48a9c59 to
2460343
Compare
📌 Description
Add more tests cases to Cutlass fusedMoE kernel to match Nemotron usescases, in order to make sure it will not break.
🚀 Pull Request Checklist
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
For larger cases, i got a lot of rounding error in the quantized tests, so i scaled all the numbers down even further (was by 10, changed to by 100). I Checked and this does not create degenerate matrixes, but if you have any suggestions on how to do it better im open.