v2.6.15
What's Changed
- MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism by @DefTruth in #131
- TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator by @DefTruth in #132
- 🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching by @DefTruth in #133
- Add SeerAttention and SlimAttention Paper by @sunshinemyson in #135
New Contributors
- @sunshinemyson made their first contribution in #135
Full Changelog: v2.6.14...v2.6.15