-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
feat: bf16 x mxfp4 cutlass fused moe for gpt-oss of hopper #23368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: bf16 x mxfp4 cutlass fused moe for gpt-oss of hopper #23368
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Duncan Moss <[email protected]>
…l calls (vllm-project#22826) Signed-off-by: Will Eaton <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…22849) Signed-off-by: ilmarkov <[email protected]> Co-authored-by: ilmarkov <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]> Signed-off-by: Louie Tsai <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…oject#22827) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: iAmir97 <[email protected]> Signed-off-by: iAmir97 <[email protected]> Co-authored-by: iAmir97 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…#22890) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…okenizer (vllm-project#22786) Signed-off-by: zjy0516 <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…ct#22909) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…-project#22908) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…llm-project#22428) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Signed-off-by: Huzaifa Sidhpurwala <[email protected]> Signed-off-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Animesh Jain <[email protected]> Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Xiongfei Wei <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: kf <[email protected]> Signed-off-by: vllmellm <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Sage Moore <[email protected]> Signed-off-by: tjtanaavllm <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Vadim Gimpelson <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: Chih-Chieh Yang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: yan <[email protected]> Signed-off-by: Yan Ma <[email protected]> Signed-off-by: Xiao Liu <[email protected]> Signed-off-by: jiahanc <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]> Signed-off-by: Andy Xie <[email protected]> Signed-off-by: Haibin Lin <[email protected]> Signed-off-by: David Ben-David <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: zitian.zhao <[email protected]> Signed-off-by: 22quinn <[email protected]> Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: huangweixiao <[email protected]> Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Eric Hanley <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: CLFutureX <[email protected]> Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: tlipoca9 <[email protected]> Signed-off-by: elvischenv <[email protected]> Signed-off-by: zitian zhao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: isotr0py <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: LucasWilkinson <[email protected]> Signed-off-by: Zhang Jason <[email protected]> Signed-off-by: Yongye Zhu <[email protected]> Signed-off-by: asafg <[email protected]> Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: Lain <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Tao He <[email protected]> Signed-off-by: Michael Goin <[email protected]> Signed-off-by: QscQ <[email protected]> Signed-off-by: qingjun <[email protected]> Signed-off-by: Syed Muhammad Bin Asif <[email protected]> Signed-off-by: Lionel Villard <[email protected]> Signed-off-by: ycyaw66 <[email protected]> Signed-off-by: David Chen <[email protected]> Signed-off-by: Linkun <[email protected]> Signed-off-by: Moritz Sanft <[email protected]> Signed-off-by: Ming Yang <[email protected]> Signed-off-by: Adrian Garcia <[email protected]> Signed-off-by: shaojunqi <[email protected]> Signed-off-by: Ricardo Decal <[email protected]> Signed-off-by: Andrew Chan <[email protected]> Signed-off-by: Felix Marty <[email protected]> Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: Zhiyu Cheng <[email protected]> Signed-off-by: Shu Wang <[email protected]> Signed-off-by: Po-Han Huang <[email protected]> Signed-off-by: Shu Wang. <[email protected]> Signed-off-by: XIn Li <[email protected]> Signed-off-by: Junhao Li <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: iAmir97 <[email protected]> Signed-off-by: iAmir97 <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: Guy Stone <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: yyw <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Pradyun Ramadorai <[email protected]> Signed-off-by: Pradyun92 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Huzaifa Sidhpurwala <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Animesh Jain <[email protected]> Co-authored-by: Rui Qiao <[email protected]> Co-authored-by: XiongfeiWei <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Co-authored-by: JartX <[email protected]> Co-authored-by: fhl2000 <[email protected]> Co-authored-by: vllmellm <[email protected]> Co-authored-by: kf <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: tjtanaavllm <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Vadim Gimpelson <[email protected]> Co-authored-by: Yuxuan Zhang <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Yan Ma <[email protected]> Co-authored-by: Xiao <[email protected]> Co-authored-by: jiahanc <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Roberto L. Castro <[email protected]> Co-authored-by: Ning Xie <[email protected]> Co-authored-by: H <[email protected]> Co-authored-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: TankNee <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Seiji Eicher <[email protected]> Co-authored-by: ZiTian.Zhao <[email protected]> Co-authored-by: 22quinn <[email protected]> Co-authored-by: Abirdcfly <[email protected]> Co-authored-by: Giancarlo Delfin <[email protected]> Co-authored-by: Chenxi Yang <[email protected]> Co-authored-by: Chenxi Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Weixiao Huang <[email protected]> Co-authored-by: Raghav Ravishankar <[email protected]> Co-authored-by: ericehanley <[email protected]> Co-authored-by: Zhonghua Deng <[email protected]> Co-authored-by: Po-Han Huang (NVIDIA) <[email protected]> Co-authored-by: PiteXChen <[email protected]> Co-authored-by: lkchen <[email protected]> Co-authored-by: TJian <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: tlipoca9 <[email protected]> Co-authored-by: elvischenv <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Benji Beck <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: LiuXiaoxuanPKU <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: Minseok Lee <[email protected]> Co-authored-by: Yongye Zhu <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Zhang Jason <[email protected]> Co-authored-by: Asaf Joseph Gardin <[email protected]> Co-authored-by: asafg <[email protected]> Co-authored-by: Lain <[email protected]> Co-authored-by: tc-mb <[email protected]> Co-authored-by: imning3 <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: Tao He <[email protected]> Co-authored-by: qscqesze <[email protected]> Co-authored-by: Syed Muhammad Bin Asif <[email protected]> Co-authored-by: Lionel Villard <[email protected]> Co-authored-by: WeiQing Chen <[email protected]> Co-authored-by: ycyaw66 <[email protected]> Co-authored-by: Moritz Sanft <[email protected]> Co-authored-by: Ming Yang <[email protected]> Co-authored-by: Adrián García García <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: JaceyShao <[email protected]> Co-authored-by: shaojunqi <[email protected]> Co-authored-by: Ricardo Decal <[email protected]> Co-authored-by: Andrew Chan <[email protected]> Co-authored-by: fxmarty-amd <[email protected]> Co-authored-by: Andrew Sansom <[email protected]> Co-authored-by: Zhiyu <[email protected]> Co-authored-by: Shu Wang <[email protected]> Co-authored-by: XIn Li <[email protected]> Co-authored-by: Junhao Li <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: iAmir97 <[email protected]> Co-authored-by: iAmir97 <[email protected]> Co-authored-by: Hong Hanh <[email protected]> Co-authored-by: Daniel Serebrenik <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Guy Stone <[email protected]> Co-authored-by: yyweiss <[email protected]> Co-authored-by: Pradyun92 <[email protected]> Co-authored-by: Pradyun Ramadorai <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]>
Signed-off-by: Nir Levy <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Julien Lin <[email protected]> Signed-off-by: mgoin <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…ing (vllm-project#22468) Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…rate` (vllm-project#22283) Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Yongye Zhu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…t#22948) Signed-off-by: Duncan Moss <[email protected]>
…ct#22734) Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…llm-project#21894) Signed-off-by: Ming Yang <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
…llm-project#23318) Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
cd08720
to
c98c1db
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Duncan Moss <[email protected]>
Purpose
This PR adds support for the bf16 x mxfp4 cutlass-based fused moe for hopper from flashinfer
Separate PR for Blackwell mxfp8 x mxfp4 support for follow soon.
Test Plan
Basic tests work, full accuracy tests to follow.
Test Result