Skip to content

Commit e24bf5b

Browse files
authored
Update README.md
1 parent ea956b2 commit e24bf5b

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
-->
1616

1717
## 0x00 前言
18-
前段时间参加了一些**LLM AI Infra**面试,基本都要手撕**CUDA**⚡️,于是整体复习了一下**CUDA**优化相关的内容,也整理了一些高频题的基本写法。笔记分享在这里,不定期更新,也方便自己日后复习。关于**LLM AI Infra**,也推荐我整理的: 📖[Awesome-LLM-Inference](https://github.com/DefTruth/Awesome-LLM-Inference) ![](https://img.shields.io/github/stars/DefTruth/Awesome-LLM-Inference.svg?style=social)
18+
前段时间参加了一些**LLM AI Infra**面试,基本都要手撕**CUDA**⚡️,于是整体复习了一下**CUDA**优化相关的内容,也整理了一些高频题的基本写法。笔记分享在这里,不定期更新。关于**LLM AI Infra**,也推荐我整理的: 📖[Awesome-LLM-Inference](https://github.com/DefTruth/Awesome-LLM-Inference) ![](https://img.shields.io/github/stars/DefTruth/Awesome-LLM-Inference.svg?style=social)
1919

2020

2121

@@ -35,11 +35,11 @@
3535
- [x] 📖[relu, relu + vec4](#relu)
3636
- [x] 📖[layer_norm, layer_norm + vec4](#layernorm)
3737
- [x] 📖[rms_norm, rms_norm + vec4](#rmsnorm)
38-
- [x] 📖[flash attention forward pass](./flash_attn_1_fwd_f32.cu)
39-
- [x] 📖[nms](#NMS)
40-
- [ ] 📖sgemm + double buffer
38+
- [x] 📖[flash_attn_1_fwd_f32](./flash_attn_1_fwd_f32.cu)
39+
- [ ] 📖flash_attn_2_fwd_f32
40+
- [ ] 📖flash_attn_2_fwd_f16
41+
- [ ] 📖flash_attn_2_fwd_f8
4142
- [ ] 📖sgemm + fp16
42-
- [ ] ...
4343

4444
## 0x02 sgemm naive, sgemm + block-tile + k-tile + vec4 ([©️back👆🏻](#kernellist))
4545
<div id="sgemm"></div>

0 commit comments

Comments
 (0)