Skip to content

Commit 077096a

Browse files
authored
[Misc] Automated submodule update (#257)
* Automated submodule update * update hgemm docs * Automated submodule update
1 parent a9e2d17 commit 077096a

File tree

6 files changed

+11
-12
lines changed

6 files changed

+11
-12
lines changed

.gitmodules

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
[submodule "ffpa-attn-mma"]
55
path = ffpa-attn-mma
66
url = https://github.com/DefTruth/ffpa-attn-mma.git
7-
[submodule "hgemm-tensorcores-mma"]
8-
path = hgemm-tensorcores-mma
9-
url = https://github.com/DefTruth/hgemm-tensorcores-mma.git
10-
7+
[submodule "hgemm-mma"]
8+
path = hgemm-mma
9+
url = https://github.com/DefTruth/hgemm-mma.git

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
<img src='https://github.com/user-attachments/assets/65a8d564-8fa7-4d66-86b9-e238feb86143' height="170px" width="270px">
3131
</div>
3232

33-
- [2024-12-02]: HGEMM MMA kernels has been refactored into 🤖[hgemm-tensorcores-mma](https://github.com/DefTruth/hgemm-tensorcores-mma): ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance.
33+
- [2024-12-02]: HGEMM MMA kernels has been refactored into 🤖[hgemm-mma](https://github.com/DefTruth/hgemm-mma): ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance.
3434

3535
<div align='center'>
3636
<img src='https://github.com/user-attachments/assets/71927ac9-72b3-4ce9-b0e2-788b5885bc99' height="170px" width="270px">
@@ -43,7 +43,7 @@
4343

4444
<div id="hgemm-mma-bench"></div>
4545

46-
Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's default Tensor Cores algorithm, the `HGEMM (WMMA/MMA/CuTe)` in this repo (`blue`🔵) can achieve `98%~100%` of its (`orange`🟠) performance. Please check [toy-hgemm library⚡️⚡️](./kernels/hgemm) or [hgemm-tensorcores-mma⚡️⚡️](https://github.com/DefTruth/hgemm-tensorcores-mma) repo for more details.
46+
Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's default Tensor Cores algorithm, the `HGEMM (WMMA/MMA/CuTe)` in this repo (`blue`🔵) can achieve `98%~100%` of its (`orange`🟠) performance. Please check [toy-hgemm library⚡️⚡️](./kernels/hgemm) or [hgemm-mma⚡️⚡️](https://github.com/DefTruth/hgemm-mma) repo for more details.
4747

4848
![toy-hgemm-library](https://github.com/user-attachments/assets/962bda14-b494-4423-b8eb-775da9f5503d)
4949

hgemm-mma

Submodule hgemm-mma added at afa0d0c

hgemm-tensorcores-mma

Lines changed: 0 additions & 1 deletion
This file was deleted.

kernels/hgemm/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's d
2727
## ©️Citations🎉🎉
2828

2929
```BibTeX
30-
@misc{hgemm-tensorcores-mma@2024,
31-
title={hgemm-tensorcores-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
32-
url={https://github.com/DefTruth/hgemm-tensorcores-mma},
33-
note={Open-source software available at https://github.com/DefTruth/hgemm-tensorcores-mma},
30+
@misc{hgemm-mma@2024,
31+
title={hgemm-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
32+
url={https://github.com/DefTruth/hgemm-mma},
33+
note={Open-source software available at https://github.com/DefTruth/hgemm-mma},
3434
author={DefTruth etc},
3535
year={2024}
3636
}

third-party/cutlass

Submodule cutlass updated 29 files

0 commit comments

Comments
 (0)