Release v2.1 · xlite-dev/Awesome-LLM-Inference

What's Changed

Update README.md by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/40
🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/41
🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/42
🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/43
🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/44
Add ABQ-LLM code link by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/46
🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/47
🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/48
🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/49
Bump up to v2.1 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/50

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.0...v2.1