Releases · xlite-dev/Awesome-LLM-Inference · GitHub

03 Oct 01:02

DefTruth

v2.6

What's Changed

🔥[VPTQ] VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/70
fix typo by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/71
🔥🔥[INT-FLASHATTENTION] INT-FLASHATTENTION: ENABLING FLASH ATTENTION FOR INT8 QUANTIZATION by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/72
[Low-bit] A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/73
🔥🔥[HiFloat8] Ascend HiFloat8 Format for Deep Learning by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/74
🔥[AlignedKV] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/75
🔥🔥[Tensor Cores] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/76
🔥[KV-COMPRESS] PAGED KV-CACHE COMPRESSION WITH VARIABLE COMPRESSION RATES PER ATTENTION HEAD by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/77
🔥[LayerKV] Optimizing Large Language Model Serving with Layer-wise KV Cache Management by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/78
Bump up to v2.6 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/79

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.5...v2.6

Contributors

DefTruth

Assets 2

26 Sep 03:25

DefTruth

v2.5

What's Changed

🔥[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/65
Update codebase of paper "parallel speculative decoding with adaptive draft length" by @smart-lty in https://github.com/DefTruth/Awesome-LLM-Inference/pull/66
move RetrievalAttention -> long context by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/67
🔥🔥[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/68
Bump up to v2.5 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/69

New Contributors

@smart-lty made their first contribution in https://github.com/DefTruth/Awesome-LLM-Inference/pull/66

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.4...v2.5

Contributors

DefTruth and smart-lty

Assets 2

18 Sep 05:10

DefTruth

v2.4

What's Changed

🔥[RetrievalAttention] Accelerating Long-Context LLM Inference via Vector Retrieval by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/62
🔥[Inf-MLLM] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/63
Bump up to v2.4 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/64

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.3...v2.4

Contributors

DefTruth

Assets 2

09 Sep 01:25

DefTruth

v2.3

What's Changed

🔥[CHESS] CHESS : Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/59
🔥[SpMM] High Performance Unstructured SpMM Computation Using Tensor Cores by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/60
Bump up to v2.3 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/61

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.2...v2.3

Contributors

DefTruth

Assets 2

04 Sep 06:22

DefTruth

v2.2

What's Changed

Add NanoFlow code link by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/51
🔥[ACTIVATION SPARSITY] TRAINING-FREE ACTIVATION SPARSITY IN LARGE LANGUAGE MODELS by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/52
🔥[Decentralized LLM] Decentralized LLM Inference over Edge Networks with Energy Harvesting by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/53
🔥[SJF Scheduling] Efficient LLM Scheduling by Learning to Rank by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/54
🔥[Speculative Decoding] Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/55
🔥🔥[Prompt Compression] Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/56
🔥🔥[Context Distillation] Efficient LLM Context Distillation by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/57
Bump up to v2.2 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/58

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.1...v2.2

Contributors

DefTruth

Assets 2

28 Aug 01:53

DefTruth

v2.1

What's Changed

Update README.md by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/40
🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/41
🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/42
🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/43
🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/44
Add ABQ-LLM code link by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/46
🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/47
🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/48
🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/49
Bump up to v2.1 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/50

Full Changelog: DefTruth/Awesome-LLM-Inference@v2.0...v2.1

Contributors

DefTruth

Assets 2

19 Aug 01:22

DefTruth

v2.0

What's Changed

🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/33
🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/34
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/35
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/36
🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/37
[Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/38
Bump up to v2.0 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/39

Full Changelog: DefTruth/Awesome-LLM-Inference@v1.9...v2.0

Contributors

DefTruth

Assets 2

12 Aug 01:27

DefTruth

v1.9

What's Changed

🔥[DynamoLLM] DynamoLLM: Designing LLM Inference Clusters for Performa… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/28
🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigati… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/29
🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Servin… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/30
🔥🔥[500xCompressor] 500xCompressor: Generalized Prompt Compression for… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/31
Bump up to v1.9 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/32

Full Changelog: DefTruth/Awesome-LLM-Inference@v1.8...v1.9

Contributors

DefTruth

Assets 2

05 Aug 02:33

DefTruth

v1.8

What's Changed

🔥[flashinfer] FlashInfer: Kernel Library for LLM Serving(@flashinfer-ai) by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/24
🔥[Palu] Palu: Compressing KV-Cache with Low-Rank Projection(@nycu.edu… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/25
🔥[SentenceVAE] SentenceVAE: Faster, Longer and More Accurate Inferenc… by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/26
Bump up to v1.8 by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/27

Full Changelog: DefTruth/Awesome-LLM-Inference@v1.7...v1.8

Contributors

DefTruth and flashinfer-ai

Assets 2

29 Jul 00:46

DefTruth

v1.7

What's Changed

Add paper "Internal Consistency and Self-Feedback in Large Language Models: A Survey" by @fan2goa1 in https://github.com/DefTruth/Awesome-LLM-Inference/pull/21
Update README.md by @clevercool in https://github.com/DefTruth/Awesome-LLM-Inference/pull/22

New Contributors

@fan2goa1 made their first contribution in https://github.com/DefTruth/Awesome-LLM-Inference/pull/21
@clevercool made their first contribution in https://github.com/DefTruth/Awesome-LLM-Inference/pull/22

Full Changelog: DefTruth/Awesome-LLM-Inference@v1.6...v1.7

Contributors

clevercool and fan2goa1

Assets 2