v2.6.14
What's Changed
- [feat] add deepseek FlashMLA by @shaoyuyoung in #120
- Add our ICLR2025 work Dynamic-LLaVA by @Blank-z0 in #121
- 🔥[MHA2MLA] Towards Economical Inference: Enabling DeepSeek’s Multi-Head Latent Attention in Any Transformer-based LLMs by @DefTruth in #122
- update the title of SageAttention2 and add SpargeAttn by @jt-zhang in #123
- Add DeepSeek Open Sources modules by @DefTruth in #124
- Update DeepSeek/MLA Topics by @DefTruth in #125
- Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse by @skejriwal44 in #126
- 🔥[X-EcoMLA] Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression by @DefTruth in #127
- Add download_pdfs.py by @DefTruth in #128
- Update README.md by @DefTruth in #129
- Update Mooncake-v3 paper link by @DefTruth in #130
New Contributors
- @Blank-z0 made their first contribution in #121
- @jt-zhang made their first contribution in #123
- @skejriwal44 made their first contribution in #126
Full Changelog: v2.6.13...v2.6.14