Skip to content

Commit d527537

Browse files
authored
Update README.md
1 parent ffa027f commit d527537

File tree

1 file changed

+15
-14
lines changed

1 file changed

+15
-14
lines changed

README.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,18 @@
3434
</p>
3535
</div>
3636

37+
## ©️Citations🎉🎉
38+
39+
```BibTeX
40+
@misc{LeetCUDA@2025,
41+
title={LeetCUDA: A Modern CUDA Learn Notes with PyTorch for Beginners},
42+
url={https://github.com/xlite-dev/LeetCUDA.git},
43+
note={Open-source software available at https://github.com/xlite-dev/LeetCUDA.git},
44+
author={DefTruth and Many Others},
45+
year={2025}
46+
}
47+
```
48+
3749

3850
## 📖 News 🔥🔥
3951
<div id="news"></div>
@@ -54,6 +66,7 @@
5466
<img src='https://github.com/user-attachments/assets/9472e970-c083-4b31-9252-3eeecc761078' height="170px" width="270px">
5567
</div>
5668

69+
5770
## 📖 Contents
5871
<div id="contents"></div>
5972
<!---
@@ -98,7 +111,7 @@
98111
- [📚 Hard++ ⭐⭐⭐️⭐️⭐️](#cuda-kernel-hard-plus)
99112
- [📚 Triton ⭐⭐⭐️](#triton-kernel)
100113
- [📚 CUTLASS ⭐⭐⭐️](#cutlass-kernel)
101-
- [📖 100+ 高性能计算文章 💡💡](#my-blogs-part-1)
114+
- [📖 100+ LLM/CUDA Blogs 🔥](#my-blogs-part-1)
102115
- [📖 How to Contribute 👀👇](#contribute)
103116

104117

@@ -225,18 +238,6 @@ flash_attn_mma_stages_split_q_tiling_qkv_kernel(half* Q, half* K, half* V, half*
225238

226239
💡NOTE: [📚Split Q + Fully QKV Fine-grained Tiling](#mma-tiling-qkv) has been refactored into 🤖[ffpa-attn](https://github.com/xlite-dev/ffpa-attn).
227240

228-
## ©️Citations🎉🎉
229-
230-
```BibTeX
231-
@misc{LeetCUDA@2025,
232-
title={LeetCUDA: A Modern CUDA Learn Notes with PyTorch for Beginners},
233-
url={https://github.com/xlite-dev/LeetCUDA},
234-
note={Open-source software available at https://github.com/xlite-dev/LeetCUDA},
235-
author={DefTruth etc},
236-
year={2025}
237-
}
238-
```
239-
240241
## 📖 200+ CUDA Kernels 🔥🔥 (Easy -> Hard++) ([©️back👆🏻](#contents))
241242

242243
<div id="cuda-kernel"></div>
@@ -481,7 +482,7 @@ The kernels listed here will guide you through a step-by-step progression, rangi
481482

482483
💡NOTE: 🤖[ffpa-attn](https://github.com/xlite-dev/ffpa-attn): 📚FFPA - Yet another Faster Flash Prefill Attention with O(1)🎉SRAM complexity for headdim > 256, **1.8x~3x**🎉faster than SDPA EA: [📈L20 ~1.9x↑🎉](https://github.com/xlite-dev/ffpa-attn?tab=readme-ov-file#L1-bench-l20), [📈 A30 ~1.8x↑🎉](https://github.com/xlite-dev/ffpa-attn?tab=readme-ov-file#L1-bench-a30), [📈3080 ~2.9x↑🎉](https://github.com/xlite-dev/ffpa-attn?tab=readme-ov-file#L1-bench-3080), [📈4090 ~2.1x↑🎉](https://github.com/xlite-dev/ffpa-attn?tab=readme-ov-file#L1-bench-4090).
483484

484-
### 📚 Triton Kernel (OpenAI Triton) ([©️back👆🏻](#cuda-kernel))
485+
### 📚 Triton Kernel (OpenAI Triton) ⭐️⭐️⭐️ ([©️back👆🏻](#cuda-kernel))
485486

486487
<div id="triton-kernel"></div>
487488

0 commit comments

Comments
 (0)