feat: masked layout fp4 gemm using cute-dsl #1331

yzh119 · 2025-07-25T13:05:15Z

📌 Description

Implement fp4 gemm (w/ masked layout) requested in sgl-project/sglang#7994
Adapted from cutlass's dense_blockscaled_gemm_persistent example, with DeepGEMM style tile-scheduler

🔍 Related Issues

sgl-project/sglang#7994

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

cc @fzyzcjy
Co-authored-by: Avery Huang [email protected]

gemini-code-assist · 2025-07-25T13:05:19Z

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

…dsl-fp4-masked-layout

… 1 iteration

…dsl-fp4-masked-layout

…ayout

…nfer-dev into cute-dsl-fp4-masked-layout

…ayout

yzh119 · 2025-08-13T04:43:38Z

There are still a lot of work to be done:

AOT compile.
API design.

Left them for future PRs, let's unblock users and test functionality first.

fengxie · 2025-08-13T05:09:51Z

flashinfer/cute_dsl/blockscaled_gemm.py

+        return self._num_tiles_executed
+
+
+"""


I think it's safe to delete this docstring?

fengxie · 2025-08-13T05:13:42Z

flashinfer/cute_dsl/blockscaled_gemm.py

+        a_tensor = cute.make_tensor(
+            a_ptr,
+            layout=cute.make_ordered_layout(
+                (self._m, self._k, self._l),


It's a non-blocking comments.

This assumes static shape if it's passed by members. Just double check it's what we are expecting here? To support dynamic shape, m/k/l must be passed via run_cute_ptr's argument list as Int32 type I believe.

I think currently it's okay to assume we have static shapes, the number of groups should depend on the TP/EP size and N/K are fixed, we can compile one for each cudagraph configuration. For M we can just set a maximum possible value and the kernel execution time will only depend on the value of mask_m tensor, not M.

cc @kaixih for confirmation.

cool. I think it's also one of the advantage of using jit here. You can also selectively choose static shape which usually end-up with better SASS.

yzh119 added 2 commits July 25, 2025 08:09

init

c908eec

upd

a9c9d8d

yyihuang self-assigned this Jul 25, 2025

yyihuang added 14 commits July 26, 2025 05:07

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into cute-…

e8d7e94

…dsl-fp4-masked-layout

draft python interface and test

36f2073

upd default plan params

a65a0eb

to fix: convert torch to cute tensor

05fec34

upd test tensor init

dd06f20

stash todo

e1edff7

init dlpack utils

ee41387

upd dlpack init by shape and stride

ef10daa

ckpt: add tensor print

2963db5

add test cases and IR print cleanup todo

8185aef

ckpt: to fix float8 not supported by dlpack

5a18102

ckpt: workaround fp8 with int8, add sync between tests, cannot exceed…

c8d9989

… 1 iteration

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into cute-…

5a31f52

…dsl-fp4-masked-layout

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into cute-…

93186eb

…dsl-fp4-masked-layout

yzh119 marked this pull request as ready for review August 7, 2025 08:47

yzh119 and others added 2 commits August 7, 2025 04:47

Merge remote-tracking branch 'origin/main' into cute-dsl-fp4-masked-l…

adbf041

…ayout

remove prints

73d5ca4

yyihuang marked this pull request as draft August 11, 2025 00:02

yyihuang and others added 2 commits August 11, 2025 06:17

upd

40f4edb

upd

ffe730a

fzyzcjy mentioned this pull request Aug 11, 2025

Support NVFP4 masked layout MoE sgl-project/sglang#7994

Open

9 tasks

yzh119 and others added 5 commits August 11, 2025 15:42

upd

8c378b1

upd

3020648

Merge branch 'cute-dsl-fp4-masked-layout' of github.com:yzh119/flashi…

6cd81c5

…nfer-dev into cute-dsl-fp4-masked-layout

upd

5fd30df

stash

5eaab4d

yyihuang and others added 5 commits August 12, 2025 19:25

upd

fce9c07

remove unused ut

f847545

upd

f33da08

upd

058fa2d

add test ref

0991de2

yyihuang marked this pull request as ready for review August 13, 2025 03:35

yyihuang added the ready label Aug 13, 2025

fix

ac2b2b1

yzh119 changed the title ~~[WIP]: Masked layout fp4 gemm using cute-dsl~~ feat: masked layout fp4 gemm using cute-dsl Aug 13, 2025

Merge remote-tracking branch 'origin/main' into cute-dsl-fp4-masked-l…

7db7cdd

…ayout

yyihuang approved these changes Aug 13, 2025

View reviewed changes

yzh119 added 2 commits August 13, 2025 00:05

lint

0184f90

add cute_dsl blockscaled gemm to blackwell ut

344bb13

yzh119 enabled auto-merge (squash) August 13, 2025 04:08

yzh119 added 3 commits August 13, 2025 00:11

fix

445bb46

fix random seed

470e4aa

ruff

5868824

fengxie reviewed Aug 13, 2025

View reviewed changes

yzh119 merged commit 1e62f1a into flashinfer-ai:main Aug 13, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: masked layout fp4 gemm using cute-dsl #1331

feat: masked layout fp4 gemm using cute-dsl #1331

yzh119 commented Jul 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jul 25, 2025

Uh oh!

yzh119 commented Aug 13, 2025

Uh oh!

fengxie Aug 13, 2025

Uh oh!

fengxie Aug 13, 2025 •

edited

Loading

Uh oh!

yzh119 Aug 14, 2025 •

edited

Loading

Uh oh!

fengxie Aug 14, 2025

Uh oh!

Uh oh!

Uh oh!

feat: masked layout fp4 gemm using cute-dsl #1331

feat: masked layout fp4 gemm using cute-dsl #1331

Conversation

yzh119 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist bot commented Jul 25, 2025

Uh oh!

yzh119 commented Aug 13, 2025

Uh oh!

fengxie Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

fengxie Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzh119 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengxie Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yzh119 commented Jul 25, 2025 •

edited

Loading

fengxie Aug 13, 2025 •

edited

Loading

yzh119 Aug 14, 2025 •

edited

Loading