Release v2.6 · xlite-dev/Awesome-LLM-Inference

What's Changed

🔥[VPTQ] VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/70
fix typo by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/71
🔥🔥[INT-FLASHATTENTION] INT-FLASHATTENTION: ENABLING FLASH ATTENTION FOR INT8 QUANTIZATION by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/72
[Low-bit] A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/73
🔥🔥[HiFloat8] Ascend HiFloat8 Format for Deep Learning by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/74
🔥[AlignedKV] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/75
🔥🔥[Tensor Cores] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/76
🔥[KV-COMPRESS] PAGED KV-CACHE COMPRESSION WITH VARIABLE COMPRESSION RATES PER ATTENTION HEAD by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/77
🔥[LayerKV] Optimizing Large Language Model Serving with Layer-wise KV Cache Management by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/78
Bump up to v2.6 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/79

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.5...v2.6