Skip to content

Commit 2167679

Browse files
authored
[README] Update HGEMM/SGEMM Supported Matrix (#118)
* Update hgemm_mma_stage.cu * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
1 parent c78c247 commit 2167679

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
|✔️|✔️|✔️|✔️|
2323
|**Copy Async**|**Tile MMA(More Threads)**|**Tile Warp(More Values)**|**Multi Stages**|
2424
|✔️|✔️|✔️|✔️|
25-
|**Reg Double Buffers**|**Block Swizzle**|**Warp Swizzle**|**Collective Store(Reg Reuse&Warp Shfl)**|
25+
|**Reg Double Buffers**|**Block Swizzle**|**Warp Swizzle**|**Collective Store(Shfl)**|
2626
|✔️|✔️|✔️|✔️|
2727
|**Row Major(NN)**|**Col Major(TN)**|**SGEMM TF32**|**SMEM Swizzle/Permuted**|
2828
|✔️|✔️|✔️||
@@ -173,7 +173,7 @@
173173
| ✔️ [hgemv_k16_f16](./hgemv/hgemv.cu)|f16|f16|[link](./hgemv/)|⭐️⭐️⭐️|
174174
| ✔️ [flash_attn_1_fwd_f32](./flash-attn/flash_attn.cu)|f32|f32|[link](./flash-attn)|⭐️⭐️⭐️|
175175
| ✔️ [flash_attn_2_fwd_f16_m16n8k16*](./flash-attn/flash_attn_mma.cu)|f16|f16|[link](./flash-attn)|⭐️⭐️⭐️|
176-
| ✔️ [nms_kernel](./nms/nms.cu)|f32|/|[link](./nms)|⭐️⭐️|
176+
| ✔️ [nms_f32](./nms/nms.cu)|f32|/|[link](./nms)|⭐️⭐️|
177177
| ✔️ [notes v1(deprecated)](./notes-v1.cu)|f32|f32|/|⭐️|
178178

179179
👉TIPS: * means using **Tensor Cores(MMA/WMMA)**, otherwise, using CUDA Cores by default.

0 commit comments

Comments
 (0)