v2.0
What's Changed
- 🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/33
- 🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/34
- KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/35
- Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/36
- 🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/37
- [Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/38
- Bump up to v2.0 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/39
Full Changelog: DefTruth/Awesome-LLM-Inference@v1.9...v2.0