You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We disable all speedup methods by default. Here are details of some key arguments for training:
107
+
108
108
-`--nproc_per_node`: The GPU number you want to use for the current node.
109
109
-`--plugin`: The booster plugin used by ColossalAI, `zero2` and `ddp` are supported. The default value is `zero2`. Recommend to enable `zero2`.
110
110
-`--mixed_precision`: The data type for mixed precision training. The default value is `bf16`.
@@ -116,7 +116,6 @@ We disable all speedup methods by default. Here are details of some key argument
116
116
-`--load`: Load previous saved checkpoint dir and continue training.
117
117
-`--num_classes`: Label class number. Should be 10 for CIFAR10 and 1000 for ImageNet. Only used for label-to-image generation.
118
118
119
-
120
119
For more details on the configuration of the training process, please visit our code.
121
120
122
121
<b>Multi-Node Training.</b>
@@ -149,11 +148,14 @@ python sample.py \
149
148
--num_classes 10 \
150
149
--ckpt ckpt_path
151
150
```
151
+
152
152
Here are details of some addtional key arguments for inference:
153
+
153
154
-`--ckpt`: The weight of ema model `ema.pt`. To check your training progress, it can also be our saved base model `epochXX-global_stepXX/model`, it will produce better results than ema in early training stage.
154
155
-`--num_classes`: Label class number. Should be 10 for CIFAR10, and 1000 for ImageNet (including official and our checkpoint).
155
156
156
157
### Video
158
+
157
159
<b>Training.</b> We current support `VDiT` and `Latte` for video generation. VDiT adopts DiT structure and use video as inputs data. Latte further use more efficient spatial & temporal blocks based on VDiT (not exactly align with origin [Latte](https://github.com/Vchitect/Latte)).
158
160
159
161
Our video training pipeline is a faithful implementation, and we encourage you to explore your own strategies using OpenDiT. You can train the video DiT model by executing the following command:
@@ -203,8 +205,9 @@ Inference tips: 1) EMA model requires quite long time to converge and produce me
In the realm of visual generation models, such as DiT, sequence parallelism is indispensable for effective long-sequence training and low-latency inference. Two key features can summarize the distinctive nature of these tasks:
206
-
* The model parameter is smaller compared with LLMs, but the sequence can be very long, making communication a bottleneck.
207
-
* As the model size is relatively small, it only needs sequence parallelism within a node.
208
+
209
+
- The model parameter is smaller compared with LLMs, but the sequence can be very long, making communication a bottleneck.
210
+
- As the model size is relatively small, it only needs sequence parallelism within a node.
208
211
209
212
However, existing methods like DeepSpeed-Ulysses and Megatron-LM Sequence Parallelism face limitations when applied to such tasks. They either introduce excessive sequence communication or lack efficiency in handling small-scale sequence parallelism.
210
213
@@ -214,7 +217,6 @@ Here are the results of our experiments, more results will be coming soon:
214
217
215
218

216
219
217
-
218
220
## DiT Reproduction Result
219
221
220
222
We have trained DiT using the origin method with OpenDiT to verify our accuracy. We have trained the model from scratch on ImageNet for 80k steps on 8xA100. Here are some results generated by our trained DiT:
We extend our gratitude to [Zangwei Zheng](https://zhengzangw.github.io/) for providing valuable insights into algorithms and aiding in the development of the video pipeline. Additionally, we acknowledge [Shenggan Cheng](https://shenggan.github.io/) for his guidance on code optimization and parallelism. Our appreciation also goes to [Fuzhao Xue](https://xuefuzhao.github.io/), [Shizun Wang](https://littlepure2333.github.io/home/), [Yuchao Gu](https://ycgu.site/), [Shenggui Li](https://franklee.xyz/), and [Haofan Wang](https://haofanwang.github.io/) for their invaluable advice and contributions.
@@ -249,6 +250,7 @@ This codebase borrows from [Meta's DiT](https://github.com/facebookresearch/DiT)
249
250
If you encounter problems using OpenDiT or have a feature request, feel free to create an issue! We also welcome pull requests from the community.
250
251
251
252
## Citation
253
+
252
254
```
253
255
@misc{zhao2024opendit,
254
256
author = {Xuanlei Zhao, Zhongkai Zhao, Ziming Liu, Haotian Zhou, Qianli Ma, and Yang You},
@@ -262,4 +264,4 @@ If you encounter problems using OpenDiT or have a feature request, feel free to
262
264
263
265
## Star History
264
266
265
-
[](https://star-history.com/#NUS-HPC-AI-Lab/OpenDiT&Date)
267
+
[](https://star-history.com/#NUS-HPC-AI-Lab/OpenDiT&Date)
0 commit comments