Skip to content

Commit d353d5d

Browse files
committed
updated magiattn.md for arxiv paper, pnp typo and new ack
1 parent 6771e09 commit d353d5d

File tree

2 files changed

+30
-18
lines changed

2 files changed

+30
-18
lines changed

_pages/magiattn.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ typograms: true
2121

2222
external-links:
2323
github: https://github.com/SandAI-org/MagiAttention
24-
arxiv: https://static.magi.world/static/files/MAGI_1.pdf
24+
arxiv: https://arxiv.org/pdf/2505.13211
2525

2626
authors:
2727
- name: Zewei Tao
@@ -33,7 +33,7 @@ authors:
3333
url: "https://github.com/Strivin0311"
3434
email: yunpenghuang@sand.ai
3535
affiliations:
36-
name: SandAI, Nanjing University
36+
name: SandAI
3737

3838
bibliography: magiattn.bib
3939

@@ -91,7 +91,7 @@ _styles: >
9191
</div>
9292
</div>
9393

94-
Training large-scale models for video generation presents two major challenges: (1) The extremely long context length of video tokens, which reaching up to 4 million during training, results in prohibitive computational and memory overhead. (2) The combination of block-causal attention and Packing-and-Padding (PnP) introduces highly complex attention mask patterns.
94+
Training large-scale models for video generation presents two major challenges: (1) The extremely long context length of video tokens, which reaching up to 4 million during training, results in prohibitive computational and memory overhead. (2) The combination of block-causal attention and Patch-and-Pack (PnP) introduces highly complex attention mask patterns.
9595

9696
To address these challenges, we propose [MagiAttention](https://github.com/SandAI-org/MagiAttention), which aims to support a wide variety of attention mask types with **kernel-level flexibility**, while achieving **linear scalability** with respect to context-parallel (CP) size across a broad range of scenarios, particularly suitable for training tasks involving <u><em>ultra-long, heterogeneous mask</em></u> training like video-generation for [Magi-1](https://github.com/SandAI-org/MAGI-1).
9797

@@ -102,7 +102,7 @@ Training large-scale autoregressive diffusion models like \magi for video genera
102102

103103
- The extremely long context length of video tokens, which reaching up to 4 million during training, results in prohibitive computational and memory overhead. Context-Parallelism (CP) is designed for dealing such long context challenge, but existing state-of-the-art CP methods<d-cite key="jacobs2023deepspeed,liu2023ringattentionblockwisetransformers,fang2024uspunifiedsequenceparallelism,gu2024loongtrainefficienttraininglongsequence,chen2024longvilascalinglongcontextvisual"></d-cite> face scalability limitations that face scalability limitations due to size constraints or the high communication overhead inherent in inefficient ring-style point-to-point (P2P) patterns. While recent efforts<d-cite key="wang2024datacentricheterogeneityadaptivesequenceparallelism,zhang2024dcp,ge2025bytescaleefficientscalingllm"></d-cite> dynamically adjust CP sizes to avoid unnecessary sharding and redundant communication for shorter sequences, they still incur extra memory overhead for NCCL buffers and involve complex scheduling to balance loads and synchronize across different subsets of ranks.
104104

105-
- The combination of block-causal attention and Packing-and-Padding (PnP) introduces highly complex attention mask patterns with variable sequence lengths, which cannot be efficiently handled by existing attention implementations.
105+
- The combination of block-causal attention and Patch-and-Pack (PnP)<d-cite key="dehghani2023patchnpacknavit"></d-cite> introduces highly complex attention mask patterns with variable sequence lengths, which cannot be efficiently handled by existing attention implementations.
106106

107107

108108
To address the aforementioned challenges, we propose MagiAttention, which aims to support a wide variety of attention mask types (\emph{i.e.} kernel flexibility) while achieving linear scalability with respect to context-parallel (CP) size across a broad range of scenarios. Achieving this goal depends on meeting the following fundamental conditions:
@@ -389,29 +389,31 @@ comming soon ...
389389

390390
## Future Work
391391

392-
comming soon ...
392+
For now, please check [RoadMap](https://github.com/SandAI-org/MagiAttention?tab=readme-ov-file#roadmap-%EF%B8%8F).
393393

394394
## FAQ
395395

396396
comming soon ...
397397

398+
398399
## Acknowledgement
399400

400401
We are grateful to the contributors listed below for their valuable contributions during the early stages of MagiAttention.
401402

402-
| Member | Affiliations | Email | GitHub Account |
403-
|:-----------|:-------------|:----------------------------|:---------------|
404-
| Zewei Tao | SandAI | zeweitao@sand.ai | littsk |
405-
| Yunpeng Huang | SandAI, Nanjing University | yunpenghuang@sand.ai,hyp@smail.nju.edu.cn | Strivin0311 |
406-
| Qiangang Wang | Nanjing University | 522024330081@smail.nju.edu.cn | WT1W |
407-
| Hanwen Sun | SandAI, Peking University | sunhanwen@stu.pku.edu.cn | hanwen-sun |
408-
| Tao Bu | Nanjing University | 502024330002@smail.nju.edu.cn | Big-TRex |
409-
| WenYang Fang | Nanjing University | fwy@smail.nju.edu.cn | kagami4243 |
410-
| Siyuang Yan | Nanjing University | siyuanyan@smail.nju.edu.cn | FibonaccciYan |
411-
| Zixu Jiang | Nanjing University | 522023330040@smail.nju.edu.cn | 191220042 |
412-
| Dingkun Xu | Nanjing University | 211220090@smail.nju.edu.cn | PureDimension |
413-
| Mingyu Liang | Nanjing University | mingyuliang518@gmail.com | gaomusiki |
414-
| Jingwei Xu | Nanjing University | jingweix@nju.edu.cn | paragonlight |
403+
| Member | Affiliations | Email | GitHub Account |
404+
| :------------ | :-------------------------- | :------------------------------ | :------------- |
405+
| Zewei Tao | SandAI | <zeweitao@sand.ai> | littsk |
406+
| Yunpeng Huang | SandAI | <yunpenghuang@sand.ai> | Strivin0311 |
407+
| Qiangang Wang | SandAI, Nanjing University | <522024330081@smail.nju.edu.cn> | WT1W |
408+
| Hanwen Sun | SandAI, Peking University | <sunhanwen@stu.pku.edu.cn> | hanwen-sun |
409+
| Jin Li | SandAI, Tsinghua University | <2609835176@qq.com> | lijinnn |
410+
| Tao Bu | Nanjing University | <502024330002@smail.nju.edu.cn> | Big-TRex |
411+
| WenYang Fang | Nanjing University | <fwy@smail.nju.edu.cn> | kagami4243 |
412+
| Siyuang Yan | Nanjing University | <siyuanyan@smail.nju.edu.cn> | FibonaccciYan |
413+
| Zixu Jiang | Nanjing University | <522023330040@smail.nju.edu.cn> | 191220042 |
414+
| Dingkun Xu | Nanjing University | <211220090@smail.nju.edu.cn> | PureDimension |
415+
| Mingyu Liang | Nanjing University | <mingyuliang518@gmail.com> | gaomusiki |
416+
| Jingwei Xu | Nanjing University | <jingweix@nju.edu.cn> | paragonlight |
415417

416418

417419
## Citation

assets/bibliography/magiattn.bib

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,4 +230,14 @@ @article{xu2024chatqa
230230
author={Xu, Peng and Ping, Wei and Wu, Xianchao and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan},
231231
journal={arXiv preprint arXiv:2407.14482},
232232
year={2024}
233+
}
234+
235+
@misc{dehghani2023patchnpacknavit,
236+
title={Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution},
237+
author={Mostafa Dehghani and Basil Mustafa and Josip Djolonga and Jonathan Heek and Matthias Minderer and Mathilde Caron and Andreas Steiner and Joan Puigcerver and Robert Geirhos and Ibrahim Alabdulmohsin and Avital Oliver and Piotr Padlewski and Alexey Gritsenko and Mario Lučić and Neil Houlsby},
238+
year={2023},
239+
eprint={2307.06304},
240+
archivePrefix={arXiv},
241+
primaryClass={cs.CV},
242+
url={https://arxiv.org/abs/2307.06304},
233243
}

0 commit comments

Comments
 (0)