Skip to content

Commit a2934b9

Browse files
authored
[HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS (#98)
* Update hgemm_mma.cu * Update README.md * Update hgemm.py * Update hgemm.cu * Update hgemm_mma.cu * Update hgemm.cu * Update hgemm.py * Update README.md * Update hgemm_mma.cu * Update hgemm.py * Update hgemm.cu * Update hgemm_mma.cu * Update README.md * Update hgemm.py * Update README.md * Update README.md * Update hgemm_mma_stage.cu * Update hgemm.py * Update hgemm.cu * Update README.md * Update README.md * Update hgemm_mma_stage.cu * Update hgemm_mma_stage.cu
1 parent 0aeb450 commit a2934b9

File tree

6 files changed

+1247
-314
lines changed

6 files changed

+1247
-314
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,10 @@
147147
| ✔️ [hgemm_wmma_m32n8k16....dbuf*](./hgemm/hgemm_wmma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
148148
| ✔️ [hgemm_wmma_m16n16k16...stages*](./hgemm/hgemm_wmma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
149149
| ✔️ [hgemm_wmma_m16n16k16...swizzle*](./hgemm/hgemm_wmma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
150+
| ✔️ [hgemm_mma_m16n8k16...naive*](./hgemm/hgemm_mma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
151+
| ✔️ [hgemm_mma_m16n8k16...mma2x4*](./hgemm/hgemm_mma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
152+
| ✔️ [hgemm_mma_m16n8k16...stages*](./hgemm/hgemm_mma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
153+
| ✔️ [hgemm_mma_m16n8k16...swizzle*](./hgemm/hgemm_mma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
150154
| ✔️ [sgemv_k32_f32](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
151155
| ✔️ [sgemv_k128_f32x4](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
152156
| ✔️ [sgemv_k16_f32](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
@@ -158,7 +162,7 @@
158162
| ✔️ [hard_nms cpp only](./nms/nms.cc)|f32|/|/|⭐️|
159163
| ✔️ [notes v1(deprecated)](./notes-v1.cu)|f32|f32|/|⭐️|
160164

161-
👉TIPS: * means using **Tensor Cores(MMA PTX)**, otherwise, using CUDA Cores by default.
165+
👉TIPS: * means using **Tensor Cores(MMA/WMMA)**, otherwise, using CUDA Cores by default.
162166

163167
## 0x01 📖 博客目录
164168

0 commit comments

Comments
 (0)