v2.6.7
What's Changed
- 🔥[Star-Attention: 11x~ speedup] Star Attention: Efficient LLM Inference over Long Sequences by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/101
- 🔥[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/102
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.6.6...v2.6.7