File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change 22
22
| ✔️| ✔️| ✔️| ✔️|
23
23
| ** Copy Async** | ** Tile MMA(More Threads)** | ** Tile Warp(More Values)** | ** Multi Stages** |
24
24
| ✔️| ✔️| ✔️| ✔️|
25
- | ** Reg Double Buffers** | ** Block Swizzle** | ** Warp Swizzle** | ** Collective Store(Reg Reuse&Warp Shfl)** |
25
+ | ** Reg Double Buffers** | ** Block Swizzle** | ** Warp Swizzle** | ** Collective Store(Shfl)** |
26
26
| ✔️| ✔️| ✔️| ✔️|
27
27
| ** Row Major(NN)** | ** Col Major(TN)** | ** SGEMM TF32** | ** SMEM Swizzle/Permuted** |
28
28
| ✔️| ✔️| ✔️| ❔|
173
173
| ✔️ [ hgemv_k16_f16] ( ./hgemv/hgemv.cu ) | f16| f16| [ link] ( ./hgemv/ ) | ⭐️⭐️⭐️|
174
174
| ✔️ [ flash_attn_1_fwd_f32] ( ./flash-attn/flash_attn.cu ) | f32| f32| [ link] ( ./flash-attn ) | ⭐️⭐️⭐️|
175
175
| ✔️ [ flash_attn_2_fwd_f16_m16n8k16* ] ( ./flash-attn/flash_attn_mma.cu ) | f16| f16| [ link] ( ./flash-attn ) | ⭐️⭐️⭐️|
176
- | ✔️ [ nms_kernel ] ( ./nms/nms.cu ) | f32| /| [ link] ( ./nms ) | ⭐️⭐️|
176
+ | ✔️ [ nms_f32 ] ( ./nms/nms.cu ) | f32| /| [ link] ( ./nms ) | ⭐️⭐️|
177
177
| ✔️ [ notes v1(deprecated)] ( ./notes-v1.cu ) | f32| f32| /| ⭐️|
178
178
179
179
👉TIPS: * means using ** Tensor Cores(MMA/WMMA)** , otherwise, using CUDA Cores by default.
You can’t perform that action at this time.
0 commit comments