Skip to content

Commit 76d2867

Browse files
authored
feat: triton layer-norm (#351)
* fix comments * fix comments * fix comments * fix comments * add torch.compile blogs * add torch.compile blogs * add torch.compile blogs * feat: triton layer-norm * feat: triton layer-norm * feat: triton layer-norm * feat: triton layer-norm * feat: triton layer-norm * feat: triton layer-norm * feat: triton layer-norm * Update README.md * Update README.md
1 parent 497b8c3 commit 76d2867

28 files changed

+740
-529
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
<img src=https://img.shields.io/github/watchers/xlite-dev/LeetCUDA?color=9cc >
1111
<img src=https://img.shields.io/github/forks/xlite-dev/LeetCUDA.svg?style=social >
1212
<img src=https://img.shields.io/github/stars/xlite-dev/LeetCUDA.svg?style=social >
13-
<img src=https://img.shields.io/badge/Release-v3.0.6-brightgreen.svg >
13+
<img src=https://img.shields.io/badge/Release-v3.0.12-brightgreen.svg >
1414
<img src=https://img.shields.io/badge/License-GPLv3.0-turquoise.svg >
1515
</div>
1616
</div>
@@ -459,8 +459,10 @@ The kernels listed here will guide you through a step-by-step progression, rangi
459459

460460
|📖 Triton Kernel| 📖 Elem DType| 📖 Acc DType| 📖 Docs | 📖 Level |
461461
|:---|:---|:---|:---|:---|
462-
| ✔️ [triton_vector_add_kernel](./kernels/openai-triton/elementwise/)|all|all|[link](./kernels/openai-triton/elementwise/)|⭐️⭐️|
463-
| ✔️ [triton_fused_softmax(multi-stages)](./kernels/openai-triton/fused-softmax/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/fused-softmax//)|⭐️⭐️⭐️|
462+
| ✔️ [triton_vector_add_kernel](./kernels/openai-triton/vector-add/)|all|all|[link](./kernels/openai-triton/vector-add/)|⭐️⭐️|
463+
| ✔️ [triton_fused_softmax(multi-stages)](./kernels/openai-triton/fused-softmax/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/fused-softmax/)|⭐️⭐️⭐️|
464+
| ✔️ [triton_fused_layer_norm(forward-pass)](./kernels/openai-triton/layer-norm/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/layer-norm/)|⭐️⭐️⭐️|
465+
| ✔️ [triton_fused_layer_norm(backward-pass)](./kernels/openai-triton/layer-norm/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/layer-norm/)|⭐️⭐️⭐️|
464466
| ✔️ [triton_merge_attn_states_kernel(w/ CUDA)](./kernels/openai-triton/merge-attn-states/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/merge-attn-states/)|⭐️⭐️⭐️|
465467

466468
### 📚 CUTLASS/CuTe Kernel ⭐️⭐️⭐️ ([©️back👆🏻](#cuda-kernel))

kernels/openai-triton/elementwise/cache/ZARIVSGCNM2WWDVKCRVGVJENDT5COGJCEQYAY47GLLIBDH2FTW2A/__grp__add_kernel.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

kernels/openai-triton/elementwise/cache/ZARIVSGCNM2WWDVKCRVGVJENDT5COGJCEQYAY47GLLIBDH2FTW2A/add_kernel.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

kernels/openai-triton/elementwise/cache/ZARIVSGCNM2WWDVKCRVGVJENDT5COGJCEQYAY47GLLIBDH2FTW2A/add_kernel.llir

Lines changed: 0 additions & 115 deletions
This file was deleted.

0 commit comments

Comments
 (0)