v2.6
What's Changed
- 🔥[VPTQ] VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/70
- fix typo by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/71
- 🔥🔥[INT-FLASHATTENTION] INT-FLASHATTENTION: ENABLING FLASH ATTENTION FOR INT8 QUANTIZATION by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/72
- [Low-bit] A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/73
- 🔥🔥[HiFloat8] Ascend HiFloat8 Format for Deep Learning by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/74
- 🔥[AlignedKV] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/75
- 🔥🔥[Tensor Cores] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/76
- 🔥[KV-COMPRESS] PAGED KV-CACHE COMPRESSION WITH VARIABLE COMPRESSION RATES PER ATTENTION HEAD by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/77
- 🔥[LayerKV] Optimizing Large Language Model Serving with Layer-wise KV Cache Management by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/78
- Bump up to v2.6 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/79
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.5...v2.6