Gm/sm120 fp8 by gmorgachev · Pull Request #15 · gonka-ai/vllm

gmorgachev · 2026-02-16T04:30:17Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Implement SM120 blockwise FP8 scaled matrix multiplication kernels using CUTLASS v4.x while maintaining CUTLASS v3.9.2 for SM89/SM90/SM100 archs. Changes: - Add CUTLASS v4.x FetchContent for SM120 kernel compilation - Add enable_sm120_only guard in common.hpp - Add cutlass_3x_gemm_sm120 template using Sm120 collective builders - Add SM120 per-tensor and blockwise FP8 kernels and dispatch logic - Add runtime dispatch to route SM120 GPUs to dedicated kernels - Configure CMake to build SM120 sources with v4.x includes This enables FP8 quantization on RTX PRO 6000 / RTX 5060 Ti GPUs.

Upgrade pytorch_triton to >=3.6.0 from PyTorch nightly to enable Triton MoE kernels on Blackwell (SM120) GPUs. Tested: fused_moe_kernel compiles and runs successfully on RTX 5060 Ti.

Triton's TritonGPUAccelerateMatmul MLIR pass crashes on SM120. Detect SM120 and fallback to PyTorch iterative MoE implementation. Performance note: This is ~2-4x slower than Triton fused MoE but allows MoE models to run on Blackwell GPUs until Triton is fixed.

Extend SM120 fallback to Fp8MoEMethod (not just UnquantizedFusedMoEMethod). Remove Triton 3.6.0 upgrade as it breaks PyTorch inductor compatibility.

github-actions · 2026-02-16T04:30:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gmorgachev added 4 commits February 5, 2026 06:59

Add Triton 3.6.0 upgrade for SM120 MoE kernel support

fbda04d

Upgrade pytorch_triton to >=3.6.0 from PyTorch nightly to enable Triton MoE kernels on Blackwell (SM120) GPUs. Tested: fused_moe_kernel compiles and runs successfully on RTX 5060 Ti.

Add SM120 PyTorch MoE fallback for FP8 quantized models

93cf938

Extend SM120 fallback to Fp8MoEMethod (not just UnquantizedFusedMoEMethod). Remove Triton 3.6.0 upgrade as it breaks PyTorch inductor compatibility.

gmorgachev merged commit 93d97e2 into gm/poc-layer-exp Feb 16, 2026
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gm/sm120 fp8#15

Gm/sm120 fp8#15
gmorgachev merged 4 commits intogm/poc-layer-expfrom
gm/sm120-fp8

gmorgachev commented Feb 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gmorgachev commented Feb 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gmorgachev commented Feb 16, 2026 •

edited by github-actions bot

Loading