Support bf16 in blackwell cutlass decode attention kernel #4916

Aya-ZIbra · 2025-09-23T02:43:53Z

Summary:

Reduce pipeline stages to avoid exceeding smem limit
Add static_assert to make sure smem capacity violation is raised during compilation rather than runtime
Select the TMEM intrinsics based on sizeof(Element).
Update unittest to include bf16
Also label decode kernel test name with their corresponding test parameters.

Differential Revision: D82991495

netlify · 2025-09-23T02:43:57Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`0887844`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68d46815002a910008efbd3d
😎 Deploy Preview	https://deploy-preview-4916--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-09-23T02:44:13Z

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.

Summary: X-link: facebookresearch/FBGEMM#1940 1. Reduce pipeline stages to avoid exceeding smem limit 2. Add static_assert to make sure smem capacity violation is raised during compilation rather than runtime 3. Select the TMEM intrinsics based on sizeof(Element). 4. Update unittest to include bf16 5. Also label decode kernel test name with their corresponding test parameters. Differential Revision: D82991495

facebook-github-bot · 2025-09-24T21:52:32Z

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.

meta-cla bot added the cla signed label Sep 23, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 23, 2025

Aya-ZIbra force-pushed the export-D82991495 branch from 0c09cb2 to 0887844 Compare September 24, 2025 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support bf16 in blackwell cutlass decode attention kernel #4916

Support bf16 in blackwell cutlass decode attention kernel #4916

Uh oh!

Aya-ZIbra commented Sep 23, 2025

Uh oh!

netlify bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

Uh oh!

Support bf16 in blackwell cutlass decode attention kernel #4916

Are you sure you want to change the base?

Support bf16 in blackwell cutlass decode attention kernel #4916

Uh oh!

Conversation

Aya-ZIbra commented Sep 23, 2025

Uh oh!

netlify bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

Uh oh!

netlify bot commented Sep 23, 2025 •

edited

Loading