v2.6.16

DefTruth released this 27 Apr 08:33

· 33 commits to main since this release

2889533

What's Changed

Add PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters by @Lizonghang in #137
🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang by @DefTruth in #138
🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP) by @DefTruth in #139
🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention by @DefTruth in #140
Update Multi-GPUs/Multi-Nodes Parallelism by @DefTruth in #141
🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives by @DefTruth in #142

New Contributors

@Lizonghang made their first contribution in #137

Full Changelog: v2.6.15...v2.6.16

Contributors

Lizonghang and DefTruth

Assets 2