v2.6.15

DefTruth released this 17 Apr 08:08

· 38 commits to main since this release

73d8740

What's Changed

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism by @DefTruth in #131
TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator by @DefTruth in #132
🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching by @DefTruth in #133
Add SeerAttention and SlimAttention Paper by @sunshinemyson in #135

New Contributors

@sunshinemyson made their first contribution in #135

Full Changelog: v2.6.14...v2.6.15

Contributors

sunshinemyson and DefTruth

Assets 2