|
1 |
| -<!--- |
2 |
| - <img src='https://github.com/user-attachments/assets/9306862b-2a30-4a87-bb33-0fde9e9d7cea' width=250 > |
3 |
| - <a href="#cuda-kernel">📚200+ CUDA Kernels</a> | <a href="#my-blogs-part-1"> 📚100+ LLM/CUDA Blogs</a> | <a href="#HGEMM-bench"> ⚡️HGEMM MMA</a> | <a href="#fa-mma-bench"> ⚡️FA-2 MMA </a> <p> |
4 |
| -<img src='https://github.com/user-attachments/assets/b2578723-b7a7-4d8f-bcd1-5008947b808a' > |
5 |
| -<div align="center"> |
6 |
| - <p align="center"> |
7 |
| - <a href="#contribute">愿以青衿涉险苦,为君先踏棘荆途。他年若览通衢阔,莫忘初逢问路吾。</a> |
8 |
| - </p> |
9 |
| -</div> |
10 |
| ----> |
11 |
| - |
12 |
| - |
13 | 1 | <div align="center">
|
14 | 2 | <p align="center">
|
15 | 3 | <h2>📚 LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners 🐑</h2>
|
|
24 | 12 | <img src=https://img.shields.io/github/stars/xlite-dev/LeetCUDA.svg?style=social >
|
25 | 13 | <img src=https://img.shields.io/badge/Release-v3.0.6-brightgreen.svg >
|
26 | 14 | <img src=https://img.shields.io/badge/License-GPLv3.0-turquoise.svg >
|
27 |
| - </div> |
| 15 | + </div> |
28 | 16 | </div>
|
29 | 17 |
|
30 | 18 | 📚 **LeetCUDA**: It includes **Tensor/CUDA Cores, TF32/F16/BF16/F8**, [📖200+ CUDA Kernels🔥](#cuda-kernel) with PyTorch, [📖100+ LLM/CUDA🔥](#my-blogs-part-1) blogs, [📖HGEMM⚡️](./kernels/hgemm) which can achieve `98%~100%` TFLOPS of **cuBLAS**, and [📖flash-attn⚡️](./kernels/flash-attn) using Tensor Cores with pure MMA PTX. ♥️ Please consider to leave a ⭐️ Star to support me, my bro ~ ♥️
|
31 | 19 |
|
32 | 20 | <div align="center">
|
33 | 21 | <p align="center">
|
34 |
| - <a href="#contribute">🔥🔥 PR Welcome: Add Your Kernel to LeetCUDA! Let's make it Awesome together! 🎉🎉</a> |
| 22 | + <a href="#contribute">🔥🔥 PR Welcome: Add Your Kernel to LeetCUDA! Let's make it Awesome together! 🎉🎉</a> <br> |
35 | 23 | <a href=https://github.com/xlite-dev/LeetCUDA/graphs/contributors > <img src=https://opencollective.com/leetcuda/contributors.svg height=40px > </a>
|
36 | 24 | </p>
|
37 | 25 | </div>
|
|
52 | 40 | ## 📖 News 🔥🔥
|
53 | 41 | <div id="news"></div>
|
54 | 42 |
|
55 |
| -- [2025-06-16]: [🤗CacheDiT](https://github.com/vipshop/cache-dit) is release! A **Training-free** and **Easy-to-use** Cache Acceleration Toolbox for Diffusion Transformers (**DBCache**, **DBPrune**, **FBCache**, etc)🔥. Feel free to take a try! |
| 43 | +- [2025-06-16]: [🤗CacheDiT](https://github.com/vipshop/cache-dit) is release! A **Training-free** and **Easy-to-use** Cache Acceleration Toolbox for Diffusion Transformers (**DBCache**, **DBPrune**, **FBCache**, etc)🔥. Feel free to take a try! |
56 | 44 |
|
57 | 45 | <div align='center'>
|
58 | 46 | <img src='https://github.com/user-attachments/assets/a5ec4320-d2f9-4254-888a-170b2d9e3784' height=170px>
|
|
77 | 65 |
|
78 | 66 | ## 📖 Contents
|
79 | 67 | <div id="contents"></div>
|
80 |
| -<!--- |
81 |
| -- [📖 HGEMM-MMA 🎉🎉](#HGEMM-bench) |
82 |
| - - [📚 CUDA/Tensor Cores](#HGEMM-bench) |
83 |
| - - [📚 Tile Block(Br, Bc)](#HGEMM-bench) |
84 |
| - - [📚 Tile MMAs/Warps](#HGEMM-bench) |
85 |
| - - [📚 Pack LDST(128 bits)](#HGEMM-bench) |
86 |
| - - [📚 Multi Stages(2~4)](#HGEMM-bench) |
87 |
| - - [📚 Block/Warp Swizzle](#HGEMM-bench) |
88 |
| - - [📚 SMEM Swizzle](#HGEMM-bench) |
89 |
| - - [📚 Register Double Buffers](#HGEMM-bench) |
90 |
| - - [📚 Collective Store(Shfl)](#HGEMM-bench) |
91 |
| - - [📚 Layout NN/TN](#HGEMM-bench) |
92 |
| -- [📖 FlashAttention-MMA 🎉🎉](#fa-mma-bench) |
93 |
| -- [📖 200+ CUDA Kernels 🔥🔥](#cuda-kernel) |
94 |
| -- [📖 100+ 高性能计算文章 💡💡](#my-blogs-part-1) |
95 |
| - - [📚 大模型推理优化原理](#my-blogs-part-1) |
96 |
| - - [📚 大模型分布式训推原理](#my-blogs-part-1) |
97 |
| - - [📚 CV/C++/模型部署优化](#my-blogs-part-1) |
98 |
| - - [📚 CUDA优化入门与实践](#other-blogs) |
99 |
| - - [📚 Tensor Cores入门教程](#other-blogs) |
100 |
| - - [📚 CuTe系列详解与实践](#other-blogs) |
101 |
| - - [📚 GPU指令集架构精解](#other-blogs) |
102 |
| - - [📚 GPU通信架构精解](#other-blogs) |
103 |
| -- [📖 How to Contribute 👀👇](#contribute) |
104 |
| ----> |
105 | 68 |
|
106 | 69 | - [📖 HGEMM-MMA 🎉🎉](#HGEMM-bench)
|
107 | 70 | - [📖 FlashAttention-MMA 🎉🎉](#fa-mma-bench)
|
@@ -521,7 +484,7 @@ The kernels listed here will guide you through a step-by-step progression, rangi
|
521 | 484 |
|
522 | 485 | |📖 类型-标题|📖 作者| 📖 推荐 |
|
523 | 486 | |:---|:---|:---|
|
524 |
| -| [[Diffusion推理]📖DiT推理加速综述: Caching](https://zhuanlan.zhihu.com/p/711223667)|@DefTruth|⭐️⭐️⭐| |
| 487 | +| [[Diffusion推理]📖DiT推理加速综述: Caching](https://zhuanlan.zhihu.com/p/711223667)|@DefTruth|⭐️⭐️⭐| |
525 | 488 | | [[Triton编程][基础]📖Triton极简入门: Triton Vector Add](https://zhuanlan.zhihu.com/p/1902778199261291694)|@DefTruth|⭐️⭐️⭐|
|
526 | 489 | | [[Triton编程][基础]📖Triton Fused Softmax Kernel详解: 从Python源码到PTX](https://zhuanlan.zhihu.com/p/1899562146477609112)|@DefTruth|⭐️⭐️⭐|
|
527 | 490 | | [[Triton编程][基础]📖vLLM Triton Merge Attention States Kernel详解](https://zhuanlan.zhihu.com/p/1904937907703243110)|@DefTruth|⭐️⭐️⭐|
|
@@ -665,6 +628,15 @@ The kernels listed here will guide you through a step-by-step progression, rangi
|
665 | 628 | | [[Tensor Cores]📖Nvidia Tensor Core-MMA PTX编程入门](https://zhuanlan.zhihu.com/p/621855199)|@木子知|⭐️⭐️⭐️|
|
666 | 629 | | [[Tensor Cores]📖CUDA Ampere Tensor Core HGEMM 矩阵乘法优化](https://zhuanlan.zhihu.com/p/555339335)|@nicholaswilde|⭐️⭐️⭐️|
|
667 | 630 | | [[GPU通信架构][精解]📖NVIDIA GPGPU(四)- 通信架构](https://zhuanlan.zhihu.com/p/680262016)|@Bruce|⭐️⭐️⭐️|
|
| 631 | +| [[torch.compile][原理]📖Torch.compile流程解析: 介绍](https://zhuanlan.zhihu.com/p/9418379234)|@StarCap|⭐️⭐️⭐️| |
| 632 | +| [[torch.compile][原理]📖Torch.compile流程解析: TorchDynamo](https://zhuanlan.zhihu.com/p/9640728231)|@StarCap|⭐️⭐️⭐️| |
| 633 | +| [[torch.compile][原理]📖Torch.compile流程解析: AOTAutograd](https://zhuanlan.zhihu.com/p/9997263922)|@StarCap|⭐️⭐️⭐️| |
| 634 | +| [[torch.compile][原理]📖Torch.compile流程解析: TorchInductor](https://zhuanlan.zhihu.com/p/11224299472)|@StarCap|⭐️⭐️⭐️| |
| 635 | +| [[torch.compile][原理]📖Torch.compile流程解析: 算子融合](https://zhuanlan.zhihu.com/p/21053905491)|@StarCap|⭐️⭐️⭐️| |
| 636 | +| [[torch.compile][实践]📖Torch.compile使用指南](https://zhuanlan.zhihu.com/p/620163218)|@jhang|⭐️⭐️⭐️| |
| 637 | +| [[torch.compile][实践]📖Torch.compile详细示例解析教程](https://zhuanlan.zhihu.com/p/855291863)|@Bbuf|⭐️⭐️⭐️| |
| 638 | +| [[torch.compile][原理]📖一文搞懂TorchDynamo原理](https://zhuanlan.zhihu.com/p/630933479)|@吾乃阿尔法|⭐️⭐️⭐️| |
| 639 | +| [[torch.compile][原理]📖理解torch.compile基本原理和使用方式](https://zhuanlan.zhihu.com/p/12712224407)|@俯仰|⭐️⭐️⭐️| |
668 | 640 |
|
669 | 641 | ## ©️License ([©️back👆🏻](#contents))
|
670 | 642 |
|
|
0 commit comments