v2.6.16
What's Changed
- Add PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters by @Lizonghang in #137
- 🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang by @DefTruth in #138
- 🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP) by @DefTruth in #139
- 🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention by @DefTruth in #140
- Update Multi-GPUs/Multi-Nodes Parallelism by @DefTruth in #141
- 🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives by @DefTruth in #142
New Contributors
- @Lizonghang made their first contribution in #137
Full Changelog: v2.6.15...v2.6.16