You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -387,6 +387,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
387
387
|2024.07|🔥🔥[**xFasterTransformer**] Inference Performance Optimization for Large Language Models on CPUs(@Intel) |[[pdf]](https://arxiv.org/pdf/2407.07304)|[[xFasterTransformer]](https://github.com/intel/xFasterTransformer)|⭐️ |
388
388
|2024.07|[Summary] Inference Optimization of Foundation Models on AI Accelerators(@AWS AI) |[[pdf]](https://arxiv.org/pdf/2407.09111)|⚠️|⭐️ |
389
389
|2024.10| Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation(@SYSU) |[[pdf]](https://arxiv.org/pdf/2410.03613)|⚠️|⭐️ |
390
+
|2024.10|🔥🔥[**FastAttention**] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference(@huawei etc)|[[pdf]](https://arxiv.org/pdf/2410.16663)|⚠️|⭐️ |
0 commit comments