|
1 | | -# NeMo DFM: Diffusion Foundation Models collection |
2 | | - |
3 | | -NeMo DFM is a state-of-the-art framework for fast, large-scale training and inference of video world models. It unifies the latest diffusion-based and autoregressive techniques, prioritizing efficiency and performance from research prototyping to production deployment. |
4 | | - |
5 | | -## Projects |
6 | | - |
7 | | -This collection consists of 4 projects: |
8 | | -1. [Scalable diffusion training framework](nemo_vfm/diffusion/readme.rst) |
9 | | -2. [Accelerated diffusion world models](nemo_vfm/physicalai/Cosmos/cosmos1/models/diffusion/README.md) |
10 | | -3. [Accelerated autoregressive world models](nemo_vfm/physicalai/Cosmos/cosmos1/models/autoregressive/README.md) |
11 | | -4. [Sparse attention for efficient diffusion inference](nemo_vfm/sparse_attention/README.md) |
12 | | - |
13 | | -## Citations |
14 | | - |
15 | | -If you find our code useful, please consider citing the following papers: |
16 | | -```bibtex |
17 | | -@article{patel2025training, |
18 | | - title={Training Video Foundation Models with NVIDIA NeMo}, |
19 | | - author={Patel, Zeeshan and He, Ethan and Mannan, Parth and Ren, Xiaowei and Wolf, Ryan and Agarwal, Niket and Huffman, Jacob and Wang, Zhuoyao and Wang, Carl and Chang, Jack and others}, |
20 | | - journal={arXiv preprint arXiv:2503.12964}, |
21 | | - year={2025} |
22 | | -} |
23 | | -
|
24 | | -@article{agarwal2025cosmos, |
25 | | - title={Cosmos world foundation model platform for physical ai}, |
26 | | - author={Agarwal, Niket and Ali, Arslan and Bala, Maciej and Balaji, Yogesh and Barker, Erik and Cai, Tiffany and Chattopadhyay, Prithvijit and Chen, Yongxin and Cui, Yin and Ding, Yifan and others}, |
27 | | - journal={arXiv preprint arXiv:2501.03575}, |
28 | | - year={2025} |
29 | | -} |
| 1 | +<div align="center"> |
| 2 | + |
| 3 | +# NeMo DFM: Diffusion Foundation Models |
| 4 | + |
| 5 | + |
| 6 | +<!-- We are still using Mbridge CICD NeMo. @pablo can we get our own? and the same for star gazer--> |
| 7 | + |
| 8 | +<!-- Not includeing codecov for now since we have not worked on it extensively--> |
| 9 | + |
| 10 | +[](https://github.com/NVIDIA-NeMo/DFM/actions/workflows/cicd-main.yml) |
| 11 | +[](https://www.python.org/downloads/release/python-3100/) |
| 12 | +[](https://github.com/NVIDIA-NeMo/DFM/stargazers/) |
| 13 | + |
| 14 | +[Documentation](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | [Supported Models](#supported-models) | [Examples](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples) | [Contributing](https://github.com/NVIDIA-NeMo/DFM/tree/main/CONTRIBUTING.md) |
| 15 | + |
| 16 | +</div> |
| 17 | + |
| 18 | +## Overview |
| 19 | + |
| 20 | +NeMo DFM (Diffusion Foundation Models) is a library under [NeMo Framework](https://github.com/NVIDIA-NeMo), focusing on diffusion models for **Video**, **Image**, and **Text** generation. It unifies cutting-edge diffusion-based architectures and training techniques, prioritizing efficiency and performance from research prototyping to production deployment. |
| 21 | + |
| 22 | +**Dual-Path Architecture**: DFM provides two complementary training paths to maximize flexibility: |
| 23 | + |
| 24 | +- **🌉 Megatron Bridge Path**: Built on [NeMo Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) which leverages [Megatron Core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core) for maximum scalability with n-D parallelism (TP, PP, CP, EP, VPP, DP) |
| 25 | +- **🚀 AutoModel Path**: Built on [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) for PyTorch DTensor-native SPMD training, for easy experimentation and also Day-0 support on 🤗 Hugging Face models. |
| 26 | + |
| 27 | +Choose the path that best fits your workflow—or use both for different stages of development! |
| 28 | + |
| 29 | +<!-- Once we have updated images of how DFM fits into NeMo journey. Put them here. @Eliiot can help.--> |
| 30 | +## 🔧 Installation |
| 31 | + |
| 32 | +### 🐳 Build your own Container |
| 33 | + |
| 34 | +#### 1. Build the container |
| 35 | +```bash |
| 36 | +# Initialize all submodules (Megatron-Bridge, Automodel, and nested Megatron-LM) |
| 37 | +git submodule update --init --recursive |
| 38 | + |
| 39 | +# Build the container |
| 40 | +docker build -f docker/Dockerfile.ci -t dfm:dev . |
| 41 | +``` |
| 42 | + |
| 43 | +#### 2. Start the container |
| 44 | + |
| 45 | +```bash |
| 46 | +docker run --rm -it --gpus all \ |
| 47 | + --entrypoint bash \ |
| 48 | + -v $(pwd):/opt/DFM -it dfm:dev |
| 49 | +``` |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | +### 📦 Using DFM Docker (Coming Soon) |
| 54 | + |
| 55 | +## ⚡ Quickstart |
| 56 | + |
| 57 | +### Megatron Bridge Path |
| 58 | + |
| 59 | +#### Run a Recipe |
| 60 | +You can find all predefined recipes under [recipes](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples/megatron/recipes) directory. |
| 61 | + |
| 62 | +> **Note:** You will have to use [uv](https://docs.astral.sh/uv/) to run the recipes. Please use `--group` as `megatron-bridge`. |
| 63 | +
|
| 64 | +```bash |
| 65 | +uv run --group megatron-bridge python -m torch.distributed.run --nproc-per-node $num_gpus \ |
| 66 | + examples/megatron/recipes/wan/pretrain_wan.py \ |
| 67 | + --config-file examples/megatron/recipes/wan/conf/wan_1_3B.yaml \ |
| 68 | + --training-mode pretrain \ |
| 69 | + --mock |
| 70 | +``` |
| 71 | + |
| 72 | +### AutoModel Path |
| 73 | + |
| 74 | +Train with PyTorch-native DTensor parallelism and direct 🤗 HF integration: |
| 75 | + |
| 76 | +#### Run a Recipe |
| 77 | + |
| 78 | +You can find pre-configured recipes under [automodel/finetune](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples/automodel/finetune) and [automodel/pretrain](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples/automodel/pretrain) directories. |
| 79 | + |
| 80 | +> Note: AutoModel examples live under `dfm/examples/automodel`. Use [uv](https://docs.astral.sh/uv/) with `--group automodel`. Configs are YAML-driven; pass `-c <path>` to override the default. |
| 81 | +
|
| 82 | +The fine-tune recipe sets up WAN 2.1 Text-to-Video training with Flow Matching using FSDP2 Hybrid Sharding. |
| 83 | +It parallelizes heavy transformer blocks while keeping lightweight modules (e.g., VAE) unsharded for efficiency. |
| 84 | +Adjust batch sizes, LR, and parallel sizes in `dfm/examples/automodel/finetune/wan2_1_t2v_flow.yaml`. |
| 85 | +The generation script demonstrates distributed inference with AutoModel DTensor managers, producing an MP4 on rank 0. You can tweak frame size, frames, steps, and CFG in flags. |
| 86 | + |
| 87 | +```bash |
| 88 | +# Fine-tune WAN 2.1 T2V with FSDP2 (single node, 8 GPUs) |
| 89 | +uv run --group automodel torchrun --nproc-per-node=8 \ |
| 90 | + dfm/examples/automodel/finetune/finetune.py \ |
| 91 | + -c dfm/examples/automodel/finetune/wan2_1_t2v_flow.yaml |
| 92 | + |
| 93 | +# Generate videos with FSDP2 (distributed inference) |
| 94 | +uv run --group automodel torchrun --nproc-per-node=8 \ |
| 95 | + dfm/examples/automodel/generate/wan_generate.py |
30 | 96 | ``` |
| 97 | + |
| 98 | +## 🚀 Key Features |
| 99 | + |
| 100 | +### Dual Training Paths |
| 101 | + |
| 102 | +**Megatron Bridge** delivers maximum throughput and scalability with near-linear performance to thousands of nodes. **AutoModel** provides an easy on-ramp for experimentation and research with PyTorch-native SPMD training. |
| 103 | + |
| 104 | +### Shared Capabilities |
| 105 | + |
| 106 | +- **🎥 Multi-Modal Diffusion**: Support for video, image, and text generation |
| 107 | +- **🔬 Advanced Samplers**: EDM, Flow Matching, and custom diffusion schedules |
| 108 | +- **🎭 Flexible Architectures**: DiT (Diffusion Transformers), WAN (World Action Networks) |
| 109 | +- **📊 Efficient Data Loading**: Data pipelines with sequence packing |
| 110 | +- **💾 Distributed Checkpointing**: SafeTensors-based sharded checkpoints |
| 111 | +- **🌟 Memory Optimization**: Gradient checkpointing, mixed precision, efficient attention |
| 112 | +- **🤗 HuggingFace Integration**: Seamless integration with the HF ecosystem |
| 113 | + |
| 114 | +## Supported Models |
| 115 | + |
| 116 | +DFM provides out-of-the-box support for state-of-the-art diffusion architectures: |
| 117 | + |
| 118 | +| Model | Type | Megatron Bridge | AutoModel | Description | |
| 119 | +|-------|------|-----------------|-----------|-------------| |
| 120 | +| **DiT** | Image/Video | [pretrain](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/pretrain_dit_model.py), [inference](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/inference_dit_model.py) | 🔜 | Diffusion Transformers with scalable architecture | |
| 121 | +| **WAN 2.1** | Video | [inference](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/wan/inference_wan.py), [pretrain, finetune](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/wan/pretrain_wan.py) | [pretrain](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples/automodel/pretrain), [finetune](https://github.com/NVIDIA-NeMo/DFM/tree/main/examples/automodel/finetune),[inference](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/automodel/generate/wan_validate.py) | World Action Networks for video generation | |
| 122 | + |
| 123 | +## Performance Benchmarking |
| 124 | + |
| 125 | +For detailed performance benchmarks including throughput metrics across different GPU systems and model configurations, see the (Performance Summary)[https://github.com/NVIDIA-NeMo/DFM/blob/main/docs/performance-summary.md] in our documentation. |
| 126 | + |
| 127 | +## Project Structure |
| 128 | + |
| 129 | +``` |
| 130 | +DFM/ |
| 131 | +├── dfm/ |
| 132 | +│ └── src/ |
| 133 | +│ ├── megatron/ # Megatron Bridge path |
| 134 | +│ │ ├── base/ # Base utilities for Megatron |
| 135 | +│ │ ├── data/ # Data loaders and task encoders |
| 136 | +│ │ │ ├── common/ # Shared data utilities |
| 137 | +│ │ │ ├── <model_name>/ # model-specific data handling |
| 138 | +│ │ ├── model/ # Model implementations |
| 139 | +│ │ │ ├── common/ # Shared model components |
| 140 | +│ │ │ ├── <model_name>/ # model-specific implementations |
| 141 | +│ │ └── recipes/ # Training recipes |
| 142 | +│ │ ├── <model_name>/ # model-specific training configs |
| 143 | +│ ├── automodel # AutoModel path (DTensor-native) |
| 144 | +│ │ ├── _diffusers/ # Diffusion pipeline integrations |
| 145 | +│ │ ├── datasets/ # Dataset implementations |
| 146 | +│ │ ├── distributed/ # Parallelization strategies |
| 147 | +│ │ ├── flow_matching/ # Flow matching implementations |
| 148 | +│ │ ├── recipes/ # Training scripts |
| 149 | +│ │ └── utils/ # Utilities and validation |
| 150 | +│ └── common/ # Shared across both paths |
| 151 | +│ ├── data/ # Common data utilities |
| 152 | +│ └── utils/ # Batch ops, video utils, etc. |
| 153 | +├── examples/ # Example scripts and configs |
| 154 | +``` |
| 155 | + |
| 156 | +## 🤝 Contributing |
| 157 | + |
| 158 | +We welcome contributions! Please see our Contributing Guide for details on: |
| 159 | + |
| 160 | +- Setting up your development environment |
| 161 | +- Code style and testing guidelines |
| 162 | +- Submitting pull requests |
| 163 | +- Reporting issues |
| 164 | + |
| 165 | +For questions or discussions, please open an issue on GitHub. |
| 166 | + |
| 167 | +## Acknowledgements |
| 168 | + |
| 169 | +NeMo DFM builds upon the excellent work of: |
| 170 | + |
| 171 | +- [Megatron-core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core) - Advanced model parallelism |
| 172 | +- [Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) - HuggingFace ↔ Megatron bridge |
| 173 | +- [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) - PyTorch-native SPMD training |
| 174 | +- [PyTorch Distributed](https://pytorch.org/docs/stable/distributed.html) - Foundation for distributed training |
| 175 | +- [Diffusers](https://github.com/huggingface/diffusers) - Diffusion model implementations |
0 commit comments