You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/text-to-video-training.md
+8-10Lines changed: 8 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ content_type: "tutorial"
11
11
12
12
# Text-to-Video Training
13
13
14
-
Comprehensive guide for training large-scale text-to-video generation models using WAN 2.1 architecture. This approach uses Megatron-Core and Megatron-Bridge for scalable training with advanced parallelism strategies (data, tensor, sequence, and context parallelism) and optimized kernels (e.g., Transformer Engine fused attention).
14
+
Comprehensive guide for training large-scale text-to-video generation models using WAN 2.1 architecture. This approach uses Megatron-Core and Megatron-Bridge for scalable training with advanced parallelism strategies (data, tensor, sequence, and context parallelism) and optimized kernels (for example, Transformer Engine fused attention).
15
15
16
16
**Use case**: Train production-scale text-to-video models with full control over distributed training parallelism.
- json: contain useful side-info (text caption, sizes, processing choices, and so on)
66
66
- Energon writes a `.nv-meta` directory with dataset info and a `dataset.yaml` you can version/control.
67
67
68
68
You're ready to launch training. In the training config, we will point the WAN config (or CLI overrides) to the processed data output directory as `dataset.path=${DATASET_PATH}`.
@@ -71,9 +71,7 @@ You're ready to launch training. In the training config, we will point the WAN c
71
71
72
72
## Build Container
73
73
74
-
Please follow the instructions in the container section of the main README:
Follow the instructions in the [container section](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) of the main README.
77
75
78
76
---
79
77
@@ -87,13 +85,13 @@ Multiple parallelism techniques including tensor, sequence, and context parallel
87
85
88
86
Wan training is driven by `examples/megatron/recipes/wan/pretrain_wan.py`, which supports both a YAML config file and CLI overrides.
89
87
90
-
The script exposes a `--training-mode` with `pretrain` and `finetune` presets for flow-matching hyperparameters as a starting point for experiments. This presets specify that pretraining uses noisier, biased sampling (e.g., logit-normal, higher logit_std, lower flow_shift) for stability and broad learning, while finetuning uses uniform, lower-noise settings (e.g., uniform sampling, lower logit_std, higher flow_shift) to refine details and improve quality.
88
+
The script exposes a `--training-mode` with `pretrain` and `finetune` presets for flow-matching hyperparameters as a starting point for experiments. This presets specify that pretraining uses noisier, biased sampling (for example, logit-normal, higher logit_std, lower flow_shift) for stability and broad learning, while finetuning uses uniform, lower-noise settings (for example, uniform sampling, lower logit_std, higher flow_shift) to refine details and improve quality.
91
89
92
90
**Notes**: If you use `logger.wandb_project` and `logger.wandb_exp_name`, export `WANDB_API_KEY`.
93
91
94
92
### Pretraining Script Example
95
93
96
-
We provide example scripts for running 1.3B and 14B model sizes on mock dataset (see `wan_1_3B.yaml` and `wan_14B.yaml` under `examples/megatron/recipes/wan/conf`). From these starting points, users can set their own configuration by copy one of the example override configs and update it with your settings (e.g., with actual processed data path, and specific configurations based on available hardware, etc.). Users can learn more about arguments detail at [Megatron-Bridge docs](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/docs/megatron-lm-to-megatron-bridge.md).
94
+
We provide example scripts for running 1.3B and 14B model sizes on mock dataset (see `wan_1_3B.yaml` and `wan_14B.yaml` under `examples/megatron/recipes/wan/conf`). From these starting points, users can set their own configuration by copy one of the example override configs and update it with your settings (for example, with actual processed data path, and specific configurations based on available hardware, and so on). Users can learn more about arguments detail at [Megatron-Bridge docs](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/docs/megatron-lm-to-megatron-bridge.md).
You may adjust mock shapes (`F_latents`, `H_latents`, `W_latents`) and packing behavior (`number_packed_samples`) in `WanMockDataModuleConfig` (see `dfm/src/megatron/recipes/wan/wan.py`) to simulate different data scenarios.
142
+
You can adjust mock shapes (`F_latents`, `H_latents`, `W_latents`) and packing behavior (`number_packed_samples`) in `WanMockDataModuleConfig` (see `dfm/src/megatron/recipes/wan/wan.py`) to simulate different data scenarios.
145
143
146
144
---
147
145
@@ -178,7 +176,7 @@ The table below shows current parallelism support for different model sizes:
178
176
179
177
## References
180
178
181
-
Wan Team. (2025). Wan: Open and advanced large-scale video generative models (WAN 2.1). GitHub. https://github.com/Wan-Video/Wan2.1/
179
+
Wan Team. (2025). [Wan: Open and advanced large-scale video generative models (WAN 2.1)](https://github.com/Wan-Video/Wan2.1/). GitHub.
Copy file name to clipboardExpand all lines: docs/tutorials/training-from-scratch.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ For a quick start guide, see [Megatron Workflow](../get-started/megatron.md). Th
23
23
24
24
## Dataset Preparation
25
25
26
-
This recipe uses NVIDIA's [Megatron-Energon](https://github.com/NVIDIA/Megatron-Energon) as an efficient multi-modal data loader. Datasets should be in the WebDataset-compatible format (typically sharded `.tar` archives). Energon efficiently supports large-scale distributed loading, sharding, and sampling for multi-modal pairs (e.g., text-image, text-video). Set `dataset.path` to your WebDataset location or shard pattern. See the Megatron-Energon documentation for format details and advanced options.
26
+
This recipe uses NVIDIA's [Megatron-Energon](https://github.com/NVIDIA/Megatron-Energon) as an efficient multi-modal data loader. Datasets should be in the WebDataset-compatible format (typically sharded `.tar` archives). Energon efficiently supports large-scale distributed loading, sharding, and sampling for multi-modal pairs (for example, text-image, text-video). Set `dataset.path` to your WebDataset location or shard pattern. See the Megatron-Energon documentation for format details and advanced options.
27
27
28
28
### Dataset Preparation Example
29
29
@@ -98,13 +98,13 @@ Done
98
98
99
99
## Build Container
100
100
101
-
Please follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) section of the main README.
101
+
Follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) section of the main README.
102
102
103
103
---
104
104
105
105
## Pretraining
106
106
107
-
Once you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.
107
+
After you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.
108
108
109
109
For data loading, Energon provides two key hyperparameters related to sequence packing: `task_encoder_seq_length` and `packing_buffer_size`. The `task_encoder_seq_length` parameter controls the maximum sequence length passed to the model, while`packing_buffer_size` determines the number of samples processed to create different buckets. You can look at `select_samples_to_pack` and `pack_selected_samples` methods of [DiffusionTaskEncoderWithSequencePacking](https://github.com/NVIDIA-NeMo/DFM/blob/main/dfm/src/megatron/data/common/diffusion_task_encoder_with_sp.py#L50) to get a better sense of these parameters. For further details you can look at [Energon packing](https://nvidia.github.io/Megatron-Energon/advanced/packing.html) documentation.
Once training completes, you can run inference using [inference_dit_model.py](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/inference_dit_model.py). The script requires your trained model checkpoint (`--checkpoint_path`) and a path to save generated videos (`--video_save_path`). You can pass two optional arguments, `--t5_cache_dir` and `--tokenizer_cache_dir`, to avoid re-downloading artifacts if they are already downloaded.
175
+
After training completes, you can run inference using [inference_dit_model.py](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/inference_dit_model.py). The script requires your trained model checkpoint (`--checkpoint_path`) and a path to save generated videos (`--video_save_path`). You can pass two optional arguments, `--t5_cache_dir` and `--tokenizer_cache_dir`, to avoid re-downloading artifacts if they are already downloaded.
176
176
177
177
```bash
178
178
uv run --group megatron-bridge python -m torch.distributed.run --nproc-per-node $num_gpus \
0 commit comments