Skip to content

Commit 1e62f1a

Browse files
yzh119yyihuang
andauthored
feat: masked layout fp4 gemm using cute-dsl (#1331)
<!-- .github/pull_request_template.md --> ## 📌 Description Implement fp4 gemm (w/ masked layout) requested in sgl-project/sglang#7994 Adapted from cutlass's [dense_blockscaled_gemm_persistent](https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/blackwell/dense_blockscaled_gemm_persistent.py) example, with [DeepGEMM style tile-scheduler](https://github.com/deepseek-ai/DeepGEMM/blob/187656694f7f69e3e7975617a68bc3387680a7e1/deep_gemm/include/deep_gemm/common/scheduler.cuh) ## 🔍 Related Issues <!-- Link any related issues here --> sgl-project/sglang#7994 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes cc @fzyzcjy Co-authored-by: Avery Huang <[email protected]> --------- Co-authored-by: Yingyi Huang <[email protected]>
1 parent 0e724bf commit 1e62f1a

File tree

7 files changed

+3004
-5
lines changed

7 files changed

+3004
-5
lines changed

0 commit comments

Comments
 (0)