Skip to content

Commit 2889533

Browse files
authored
🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives (#142)
1 parent 195ef13 commit 2889533

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -509,6 +509,7 @@ python3 download_pdfs.py # The code is generated by Doubao AI
509509
|2024.12| 🔥🔥[**HADACORE**] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL(@Meta)|[[pdf]](https://arxiv.org/pdf/2407.09621)|[[hadamard_transform]](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/inference/hadamard_transform) ![](https://img.shields.io/github/stars/pytorch-labs/applied-ai.svg?style=social)|⭐️ |
510510
|2024.10| 🔥🔥[**FLASH-ATTENTION RNG**] Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM(@Princeton University)|[[pdf]](https://arxiv.org/pdf/2410.07531)|⚠️|⭐️ |
511511
|2025.02| 🔥🔥[**TRITONBENCH**] TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operators(@thunlp) | [[pdf]](https://arxiv.org/pdf/2502.14752) | [[TritonBench]](https://github.com/thunlp/TritonBench) ![](https://img.shields.io/github/stars/thunlp/TritonBench.svg?style=social)|⭐️⭐️ |
512+
|2025.04| 🔥🔥[**Triton-distributed**] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives(@ByteDance-Seed) | [[pdf]](https://arxiv.org/pdf/2503.20313) | [[Triton-distributed]](https://github.com/ByteDance-Seed/Triton-distributed) ![](https://img.shields.io/github/stars/ByteDance-Seed/Triton-distributed.svg?style=social)|⭐️⭐️ |
512513

513514
### 📖VLM/Position Embed/Others ([©️back👆🏻](#paperlist))
514515
<div id="Others"></div>

0 commit comments

Comments
 (0)