Skip to content

v2.6.16

Choose a tag to compare

@DefTruth DefTruth released this 27 Apr 08:33
· 33 commits to main since this release
2889533

What's Changed

  • Add PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters by @Lizonghang in #137
  • 🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang by @DefTruth in #138
  • 🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP) by @DefTruth in #139
  • 🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention by @DefTruth in #140
  • Update Multi-GPUs/Multi-Nodes Parallelism by @DefTruth in #141
  • 🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives by @DefTruth in #142

New Contributors

Full Changelog: v2.6.15...v2.6.16