Skip to content

Commit d2a59fd

Browse files
authored
Update README.md (#242)
1 parent 7255d25 commit d2a59fd

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's d
5353
|✔️WMMA(m16n16k16)|✔️MMA(m16n8k16)|✔️Pack LDST(128 bits)|✔️SMEM Padding|
5454
|✔️Copy Async|✔️Tile MMAs|✔️Tile Warps|✔️**Multi Stages(2~4)**|
5555
|✔️Register Double Buffers|✔️**Block Swizzle**|✔️**Warp Swizzle**|✔️**SMEM Swizzle**(CuTe/MMA)|
56-
|✔️Collective Store(Shfl)|️Row Major(NN)|✔️Col Major(TN)|✔️SGEMM FP32/TF32|
56+
|✔️Collective Store(Shfl)|️Layout NN|✔️Layout TN|✔️SGEMM FP32/TF32|
5757

5858
## 📖 FA2-MMA Benchmark 🎉🎉
5959

0 commit comments

Comments
 (0)