Skip to content

v2.6.15

Choose a tag to compare

@DefTruth DefTruth released this 17 Apr 08:08
· 38 commits to main since this release
73d8740

What's Changed

  • MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism by @DefTruth in #131
  • TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator by @DefTruth in #132
  • 🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching by @DefTruth in #133
  • Add SeerAttention and SlimAttention Paper by @sunshinemyson in #135

New Contributors

Full Changelog: v2.6.14...v2.6.15