You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -92,6 +93,13 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
92
93
|2024.11|🔥🔥🔥[**SP: Star-Attention, 11x~ speedup**] Star Attention: Efficient LLM Inference over Long Sequences(@NVIDIA)|[[pdf]](https://arxiv.org/pdf/2411.17116)|[[Star-Attention]](https://github.com/NVIDIA/Star-Attention)|⭐️⭐️ |
93
94
|2024.12|🔥🔥[**SP: TokenRing**] TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication(@SJTU) |[[pdf]](https://arxiv.org/pdf/2412.20501)|[[token-ring]](https://github.com/ACA-Lab-SJTU/token-ring)|⭐️⭐️ |
|2024.01|🔥🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving(@PKU)|[[pdf]](https://arxiv.org/pdf/2401.09670)|[[DistServe]](https://github.com/LLMServe/DistServe)|⭐️⭐️ |
0 commit comments