[Misc] Automated submodule update (#257)

DefTruth · web-flow · commit 077096a82fa4 · 2025-03-04T10:09:14.000+08:00
* Automated submodule update

* update hgemm docs

* Automated submodule update
diff --git a/.gitmodules b/.gitmodules
@@ -4,7 +4,6 @@
 [submodule "ffpa-attn-mma"]
 	path = ffpa-attn-mma
 	url = https://github.com/DefTruth/ffpa-attn-mma.git
-[submodule "hgemm-tensorcores-mma"]
-	path = hgemm-tensorcores-mma
-	url = https://github.com/DefTruth/hgemm-tensorcores-mma.git
-
+[submodule "hgemm-mma"]
+	path = hgemm-mma
+	url = https://github.com/DefTruth/hgemm-mma.git
diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@
   <img src='https://github.com/user-attachments/assets/65a8d564-8fa7-4d66-86b9-e238feb86143' height="170px" width="270px">
 </div> 
 
-- [2024-12-02]: HGEMM MMA kernels has been refactored into 🤖[hgemm-tensorcores-mma](https://github.com/DefTruth/hgemm-tensorcores-mma): ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance.
+- [2024-12-02]: HGEMM MMA kernels has been refactored into 🤖[hgemm-mma](https://github.com/DefTruth/hgemm-mma): ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance.
 
 <div align='center'>
   <img src='https://github.com/user-attachments/assets/71927ac9-72b3-4ce9-b0e2-788b5885bc99' height="170px" width="270px">
@@ -43,7 +43,7 @@
 
 <div id="hgemm-mma-bench"></div>  
 
-Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's default Tensor Cores algorithm, the `HGEMM (WMMA/MMA/CuTe)` in this repo (`blue`🔵) can achieve `98%~100%` of its (`orange`🟠) performance. Please check [toy-hgemm library⚡️⚡️](./kernels/hgemm) or [hgemm-tensorcores-mma⚡️⚡️](https://github.com/DefTruth/hgemm-tensorcores-mma) repo for more details.
+Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's default Tensor Cores algorithm, the `HGEMM (WMMA/MMA/CuTe)` in this repo (`blue`🔵) can achieve `98%~100%` of its (`orange`🟠) performance. Please check [toy-hgemm library⚡️⚡️](./kernels/hgemm) or [hgemm-mma⚡️⚡️](https://github.com/DefTruth/hgemm-mma) repo for more details.
 
 ![toy-hgemm-library](https://github.com/user-attachments/assets/962bda14-b494-4423-b8eb-775da9f5503d)
 
diff --git a/hgemm-mma b/hgemm-mma
@@ -0,0 +1 @@
+Subproject commit afa0d0ca5c48e210e8c5fbd5d9da71fe5d21d9b7
diff --git a/hgemm-tensorcores-mma b/hgemm-tensorcores-mma
diff --git a/kernels/hgemm/README.md b/kernels/hgemm/README.md
@@ -27,10 +27,10 @@ Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's d
 ## ©️Citations🎉🎉
 
 ```BibTeX
-@misc{hgemm-tensorcores-mma@2024,
-  title={hgemm-tensorcores-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
-  url={https://github.com/DefTruth/hgemm-tensorcores-mma},
-  note={Open-source software available at https://github.com/DefTruth/hgemm-tensorcores-mma},
+@misc{hgemm-mma@2024,
+  title={hgemm-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
+  url={https://github.com/DefTruth/hgemm-mma},
+  note={Open-source software available at https://github.com/DefTruth/hgemm-mma},
   author={DefTruth etc},
   year={2024}
 }
diff --git a/third-party/cutlass b/third-party/cutlass
@@ -1 +1 @@
-Subproject commit eefa171318b79cbe2e78514d4cce5cd0fe919d0c
+Subproject commit df18f5e4f5de76bed8be1de8e4c245f2f5ec3020