Skip to content

Commit 613300d

Browse files
authored
🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference (#88)
1 parent 4184e26 commit 613300d

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
5454
* 📖[Parallel Decoding/Sampling](#Parallel-Decoding-Sampling)🔥
5555
* 📖[Structured Prune/KD/Weight Sparse](#Structured_Pruning_KD_Weight_Sparse)
5656
* 📖[Mixture-of-Experts(MoE) LLM Inference](#Mixture_of_Experts_LLM_Inference)🔥
57-
* 📖[CPU/Single GPU/FPGA/Mobile Inference](#CPU-Single-GPU-Inference)
57+
* 📖[CPU/NPU/FPGA/Mobile Inference](#CPU-Single-GPU-Inference)
5858
* 📖[Non Transformer Architecture](#Non-Transformer-Architecture)🔥
5959
* 📖[GEMM/Tensor Cores/WMMA/Parallel](#GEMM-Tensor-Cores-WMMA)
6060
* 📖[VLM/Position Embed/Others](#Others)
@@ -373,7 +373,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
373373
|2024.06| [MoE] A Survey on Mixture of Experts(@HKU) | [[pdf]](https://arxiv.org/pdf/2407.06204)| ⚠️ |⭐️|
374374

375375

376-
### 📖CPU/Single GPU/FPGA/Mobile Inference ([©️back👆🏻](#paperlist))
376+
### 📖CPU/Single GPU/FPGA/NPU/Mobile Inference ([©️back👆🏻](#paperlist))
377377
<div id="CPU-Single-GPU-Inference"></div>
378378

379379
|Date|Title|Paper|Code|Recom|
@@ -387,6 +387,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
387387
|2024.07|🔥🔥[**xFasterTransformer**] Inference Performance Optimization for Large Language Models on CPUs(@Intel) | [[pdf]](https://arxiv.org/pdf/2407.07304)|[[xFasterTransformer]](https://github.com/intel/xFasterTransformer) ![](https://img.shields.io/github/stars/intel/xFasterTransformer.svg?style=social) |⭐️ |
388388
|2024.07| [Summary] Inference Optimization of Foundation Models on AI Accelerators(@AWS AI) | [[pdf]](https://arxiv.org/pdf/2407.09111)|⚠️|⭐️ |
389389
|2024.10| Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation(@SYSU) | [[pdf]](https://arxiv.org/pdf/2410.03613)|⚠️|⭐️ |
390+
|2024.10|🔥🔥[**FastAttention**] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference(@huawei etc)| [[pdf]](https://arxiv.org/pdf/2410.16663)|⚠️|⭐️ |
390391

391392

392393
### 📖Non Transformer Architecture ([©️back👆🏻](#paperlist))

0 commit comments

Comments
 (0)