Skip to content

Conversation

djmmoss
Copy link
Contributor

@djmmoss djmmoss commented Aug 21, 2025

Purpose

This PR adds support for the bf16 x mxfp4 cutlass-based fused moe for hopper from flashinfer

Separate PR for Blackwell mxfp8 x mxfp4 support for follow soon.

Test Plan

Basic tests work, full accuracy tests to follow.

Test Result

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Aug 21, 2025
djmmoss and others added 28 commits August 21, 2025 12:25
…22849)

Signed-off-by: ilmarkov <[email protected]>
Co-authored-by: ilmarkov <[email protected]>
Co-authored-by: Wentao Ye <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Louie Tsai <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: iAmir97 <[email protected]>
Signed-off-by: iAmir97 <[email protected]>
Co-authored-by: iAmir97 <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
…llm-project#22428)

Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: Huzaifa Sidhpurwala <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Animesh Jain <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Xiongfei Wei <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
Signed-off-by: kf <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: tjtanaavllm <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Chih-Chieh-Yang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Vadim Gimpelson <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: zRzRzRzRzRzRzR <[email protected]>
Signed-off-by: Chih-Chieh Yang <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: yan <[email protected]>
Signed-off-by: Yan Ma <[email protected]>
Signed-off-by: Xiao Liu <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Signed-off-by: LopezCastroRoberto <[email protected]>
Signed-off-by: Andy Xie <[email protected]>
Signed-off-by: Haibin Lin <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: jiang1.li <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: zitian.zhao <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Signed-off-by: Abirdcfly <[email protected]>
Signed-off-by: Giancarlo Delfin <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: huangweixiao <[email protected]>
Signed-off-by: alyosha-swamy <[email protected]>
Signed-off-by: Eric Hanley <[email protected]>
Signed-off-by: Abatom <[email protected]>
Signed-off-by: CLFutureX <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: tlipoca9 <[email protected]>
Signed-off-by: elvischenv <[email protected]>
Signed-off-by: zitian zhao <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Siyuan Liu <[email protected]>
Signed-off-by: Benjamin Chislett <[email protected]>
Signed-off-by: isotr0py <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: LucasWilkinson <[email protected]>
Signed-off-by: Zhang Jason <[email protected]>
Signed-off-by: Yongye Zhu <[email protected]>
Signed-off-by: asafg <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Lain <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Signed-off-by: QscQ <[email protected]>
Signed-off-by: qingjun <[email protected]>
Signed-off-by: Syed Muhammad Bin Asif <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: ycyaw66 <[email protected]>
Signed-off-by: David Chen <[email protected]>
Signed-off-by: Linkun <[email protected]>
Signed-off-by: Moritz Sanft <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
Signed-off-by: Adrian Garcia <[email protected]>
Signed-off-by: shaojunqi <[email protected]>
Signed-off-by: Ricardo Decal <[email protected]>
Signed-off-by: Andrew Chan <[email protected]>
Signed-off-by: Felix Marty <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Shu Wang <[email protected]>
Signed-off-by: Po-Han Huang <[email protected]>
Signed-off-by: Shu Wang. <[email protected]>
Signed-off-by: XIn Li <[email protected]>
Signed-off-by: Junhao Li <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: iAmir97 <[email protected]>
Signed-off-by: iAmir97 <[email protected]>
Signed-off-by: <[email protected]>
Signed-off-by: Guy Stone <[email protected]>
Signed-off-by: <[email protected]>
Signed-off-by: yyw <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Pradyun Ramadorai <[email protected]>
Signed-off-by: Pradyun92 <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
Co-authored-by: rongfu.leng <[email protected]>
Co-authored-by: Huzaifa Sidhpurwala <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Animesh Jain <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: XiongfeiWei <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Wentao Ye <[email protected]>
Co-authored-by: JartX <[email protected]>
Co-authored-by: fhl2000 <[email protected]>
Co-authored-by: vllmellm <[email protected]>
Co-authored-by: kf <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: tjtanaavllm <[email protected]>
Co-authored-by: Yong Hoon Shin <[email protected]>
Co-authored-by: Chih-Chieh Yang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Vadim Gimpelson <[email protected]>
Co-authored-by: Yuxuan Zhang <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Thomas Parnell <[email protected]>
Co-authored-by: Yan Ma <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: jiahanc <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Ye (Charlotte) Qi <[email protected]>
Co-authored-by: Roberto L. Castro <[email protected]>
Co-authored-by: Ning Xie <[email protected]>
Co-authored-by: H <[email protected]>
Co-authored-by: David Ben-David <[email protected]>
Co-authored-by: David Ben-David <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Co-authored-by: TankNee <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Seiji Eicher <[email protected]>
Co-authored-by: ZiTian.Zhao <[email protected]>
Co-authored-by: 22quinn <[email protected]>
Co-authored-by: Abirdcfly <[email protected]>
Co-authored-by: Giancarlo Delfin <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Weixiao Huang <[email protected]>
Co-authored-by: Raghav Ravishankar <[email protected]>
Co-authored-by: ericehanley <[email protected]>
Co-authored-by: Zhonghua Deng <[email protected]>
Co-authored-by: Po-Han Huang (NVIDIA) <[email protected]>
Co-authored-by: PiteXChen <[email protected]>
Co-authored-by: lkchen <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: tlipoca9 <[email protected]>
Co-authored-by: elvischenv <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Co-authored-by: Benji Beck <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Siyuan Liu <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Co-authored-by: LiuXiaoxuanPKU <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Hongxia Yang <[email protected]>
Co-authored-by: Minseok Lee <[email protected]>
Co-authored-by: Yongye Zhu <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Zhang Jason <[email protected]>
Co-authored-by: Asaf Joseph Gardin <[email protected]>
Co-authored-by: asafg <[email protected]>
Co-authored-by: Lain <[email protected]>
Co-authored-by: tc-mb <[email protected]>
Co-authored-by: imning3 <[email protected]>
Co-authored-by: Maximilien de Bayser <[email protected]>
Co-authored-by: Kunshang Ji <[email protected]>
Co-authored-by: Tao He <[email protected]>
Co-authored-by: qscqesze <[email protected]>
Co-authored-by: Syed Muhammad Bin Asif <[email protected]>
Co-authored-by: Lionel Villard <[email protected]>
Co-authored-by: WeiQing Chen <[email protected]>
Co-authored-by: ycyaw66 <[email protected]>
Co-authored-by: Moritz Sanft <[email protected]>
Co-authored-by: Ming Yang <[email protected]>
Co-authored-by: Adrián García García <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: JaceyShao <[email protected]>
Co-authored-by: shaojunqi <[email protected]>
Co-authored-by: Ricardo Decal <[email protected]>
Co-authored-by: Andrew Chan <[email protected]>
Co-authored-by: fxmarty-amd <[email protected]>
Co-authored-by: Andrew Sansom <[email protected]>
Co-authored-by: Zhiyu <[email protected]>
Co-authored-by: Shu Wang <[email protected]>
Co-authored-by: XIn Li <[email protected]>
Co-authored-by: Junhao Li <[email protected]>
Co-authored-by: Chauncey <[email protected]>
Co-authored-by: iAmir97 <[email protected]>
Co-authored-by: iAmir97 <[email protected]>
Co-authored-by: Hong Hanh <[email protected]>
Co-authored-by: Daniel Serebrenik <[email protected]>
Co-authored-by: yewentao256 <[email protected]>
Co-authored-by: Guy Stone <[email protected]>
Co-authored-by: yyweiss <[email protected]>
Co-authored-by: Pradyun92 <[email protected]>
Co-authored-by: Pradyun Ramadorai <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Julien Lin <[email protected]>
Signed-off-by: mgoin <[email protected]>
Co-authored-by: mgoin <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Yongye Zhu <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
@djmmoss djmmoss force-pushed the dmoss/flashinfer-cutlass-mxfp4-fused-moe branch from cd08720 to c98c1db Compare August 21, 2025 19:30
@mergify mergify bot added documentation Improvements or additions to documentation ci/build deepseek Related to DeepSeek models frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models rocm Related to AMD ROCm structured-output speculative-decoding v1 tpu Related to Google TPUs labels Aug 21, 2025
@mergify mergify bot added the tool-calling label Aug 21, 2025
Copy link

mergify bot commented Aug 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @djmmoss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 21, 2025
Signed-off-by: Duncan Moss <[email protected]>
@mergify mergify bot removed tpu Related to Google TPUs needs-rebase labels Aug 21, 2025
@djmmoss djmmoss closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.