v2.1
What's Changed
- Update README.md by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/40
- 🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/41
- 🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/42
- 🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/43
- 🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/44
- Add ABQ-LLM code link by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/46
- 🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/47
- 🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/48
- 🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/49
- Bump up to v2.1 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/50
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.0...v2.1