Skip to content

Commit 78456b7

Browse files
committed
updates
Signed-off-by: Lawrence Lane <[email protected]>
1 parent a16f130 commit 78456b7

File tree

3 files changed

+16
-18
lines changed

3 files changed

+16
-18
lines changed

docs/tutorials/fine-tuning-pretrained-models.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ python dfm/src/automodel/utils/data/preprocess_resize.py \
102102
- `--height/--width`: Target resolution (both must be specified together)
103103
- `--center-crop`: Crop to exact size after aspect-preserving resize
104104
- `--device`: Device to use (`cuda` or `cpu`, default: `cuda` if available)
105-
- `--stochastic`: Use stochastic encoding instead of deterministic (may cause flares)
105+
- `--stochastic`: Use stochastic encoding instead of deterministic (can cause flares)
106106
- `--no-memory-optimization`: Disable Wan's built-in memory optimization
107107

108108
**Output:** Creates `.meta` files containing:
@@ -199,7 +199,7 @@ flow_matching: # Flow-matching training settings
199199
timestep_sampling: "uniform" # Strategy for sampling timesteps
200200
flow_shift: 3.0 # Scalar shift applied to the target flow
201201

202-
fsdp: # Distributed training (e.g., FSDP) configuration
202+
fsdp: # Distributed training (for example, FSDP) configuration
203203
dp_size: 8 # Total data-parallel replicas (single node: 8 GPUs)
204204

205205
checkpoint: # Checkpointing behavior
@@ -253,8 +253,8 @@ fsdp: # Overrides for multi-node runs
253253

254254
| Model | Parameters | Parallelization | Status |
255255
|-------|------------|-----------------|--------|
256-
| Wan 2.1 T2V 1.3B | 1.3B | FSDP2 via Automodel + DDP | ✅ |
257-
| Wan 2.1 T2V 14B | 14B | FSDP2 via Automodel + DDP | ✅ |
256+
| Wan 2.1 T2V 1.3B | 1.3B | FSDP2 using Automodel + DDP | ✅ |
257+
| Wan 2.1 T2V 14B | 14B | FSDP2 using Automodel + DDP | ✅ |
258258
| FLUX | TBD | TBD | 🔄 In Progress |
259259

260260
---

docs/tutorials/text-to-video-training.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ content_type: "tutorial"
1111

1212
# Text-to-Video Training
1313

14-
Comprehensive guide for training large-scale text-to-video generation models using WAN 2.1 architecture. This approach uses Megatron-Core and Megatron-Bridge for scalable training with advanced parallelism strategies (data, tensor, sequence, and context parallelism) and optimized kernels (e.g., Transformer Engine fused attention).
14+
Comprehensive guide for training large-scale text-to-video generation models using WAN 2.1 architecture. This approach uses Megatron-Core and Megatron-Bridge for scalable training with advanced parallelism strategies (data, tensor, sequence, and context parallelism) and optimized kernels (for example, Transformer Engine fused attention).
1515

1616
**Use case**: Train production-scale text-to-video models with full control over distributed training parallelism.
1717

@@ -54,15 +54,15 @@ uv run --group megatron-bridge python -m torch.distributed.run --nproc-per-node
5454
# 4) Use Energon to process shards and create its metadata/spec
5555
energon prepare "${DATASET_PATH}"
5656
# In the interactive prompts:
57-
# - Enter a train/val/test split, e.g., "8,1,1"
57+
# - Enter a train/val/test split, for example, "8,1,1"
5858
# - When asked for the sample type, choose: "Crude sample (plain dict for cooking)"
5959
```
6060

6161
What gets produced:
6262
- Each shard contains:
6363
- pth: contain WAN video latents
6464
- pickle: contain text embeddings
65-
- json: contain useful side-info (text caption, sizes, processing choices, etc.)
65+
- json: contain useful side-info (text caption, sizes, processing choices, and so on)
6666
- Energon writes a `.nv-meta` directory with dataset info and a `dataset.yaml` you can version/control.
6767

6868
You're ready to launch training. In the training config, we will point the WAN config (or CLI overrides) to the processed data output directory as `dataset.path=${DATASET_PATH}`.
@@ -71,9 +71,7 @@ You're ready to launch training. In the training config, we will point the WAN c
7171

7272
## Build Container
7373

74-
Please follow the instructions in the container section of the main README:
75-
76-
- DFM container guide: https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container
74+
Follow the instructions in the [container section](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) of the main README.
7775

7876
---
7977

@@ -87,13 +85,13 @@ Multiple parallelism techniques including tensor, sequence, and context parallel
8785

8886
Wan training is driven by `examples/megatron/recipes/wan/pretrain_wan.py`, which supports both a YAML config file and CLI overrides.
8987

90-
The script exposes a `--training-mode` with `pretrain` and `finetune` presets for flow-matching hyperparameters as a starting point for experiments. This presets specify that pretraining uses noisier, biased sampling (e.g., logit-normal, higher logit_std, lower flow_shift) for stability and broad learning, while finetuning uses uniform, lower-noise settings (e.g., uniform sampling, lower logit_std, higher flow_shift) to refine details and improve quality.
88+
The script exposes a `--training-mode` with `pretrain` and `finetune` presets for flow-matching hyperparameters as a starting point for experiments. This presets specify that pretraining uses noisier, biased sampling (for example, logit-normal, higher logit_std, lower flow_shift) for stability and broad learning, while finetuning uses uniform, lower-noise settings (for example, uniform sampling, lower logit_std, higher flow_shift) to refine details and improve quality.
9189

9290
**Notes**: If you use `logger.wandb_project` and `logger.wandb_exp_name`, export `WANDB_API_KEY`.
9391

9492
### Pretraining Script Example
9593

96-
We provide example scripts for running 1.3B and 14B model sizes on mock dataset (see `wan_1_3B.yaml` and `wan_14B.yaml` under `examples/megatron/recipes/wan/conf`). From these starting points, users can set their own configuration by copy one of the example override configs and update it with your settings (e.g., with actual processed data path, and specific configurations based on available hardware, etc.). Users can learn more about arguments detail at [Megatron-Bridge docs](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/docs/megatron-lm-to-megatron-bridge.md).
94+
We provide example scripts for running 1.3B and 14B model sizes on mock dataset (see `wan_1_3B.yaml` and `wan_14B.yaml` under `examples/megatron/recipes/wan/conf`). From these starting points, users can set their own configuration by copy one of the example override configs and update it with your settings (for example, with actual processed data path, and specific configurations based on available hardware, and so on). Users can learn more about arguments detail at [Megatron-Bridge docs](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/docs/megatron-lm-to-megatron-bridge.md).
9795

9896
```bash
9997
cp examples/megatron/recipes/wan/conf/wan_1_3B.yaml examples/megatron/recipes/wan/conf/my_wan.yaml
@@ -141,7 +139,7 @@ uv run --group megatron-bridge python -m torch.distributed.run --nproc-per-node
141139
--mock
142140
```
143141

144-
You may adjust mock shapes (`F_latents`, `H_latents`, `W_latents`) and packing behavior (`number_packed_samples`) in `WanMockDataModuleConfig` (see `dfm/src/megatron/recipes/wan/wan.py`) to simulate different data scenarios.
142+
You can adjust mock shapes (`F_latents`, `H_latents`, `W_latents`) and packing behavior (`number_packed_samples`) in `WanMockDataModuleConfig` (see `dfm/src/megatron/recipes/wan/wan.py`) to simulate different data scenarios.
145143

146144
---
147145

@@ -178,7 +176,7 @@ The table below shows current parallelism support for different model sizes:
178176

179177
## References
180178

181-
Wan Team. (2025). Wan: Open and advanced large-scale video generative models (WAN 2.1). GitHub. https://github.com/Wan-Video/Wan2.1/
179+
Wan Team. (2025). [Wan: Open and advanced large-scale video generative models (WAN 2.1)](https://github.com/Wan-Video/Wan2.1/). GitHub.
182180

183181
---
184182

docs/tutorials/training-from-scratch.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ For a quick start guide, see [Megatron Workflow](../get-started/megatron.md). Th
2323

2424
## Dataset Preparation
2525

26-
This recipe uses NVIDIA's [Megatron-Energon](https://github.com/NVIDIA/Megatron-Energon) as an efficient multi-modal data loader. Datasets should be in the WebDataset-compatible format (typically sharded `.tar` archives). Energon efficiently supports large-scale distributed loading, sharding, and sampling for multi-modal pairs (e.g., text-image, text-video). Set `dataset.path` to your WebDataset location or shard pattern. See the Megatron-Energon documentation for format details and advanced options.
26+
This recipe uses NVIDIA's [Megatron-Energon](https://github.com/NVIDIA/Megatron-Energon) as an efficient multi-modal data loader. Datasets should be in the WebDataset-compatible format (typically sharded `.tar` archives). Energon efficiently supports large-scale distributed loading, sharding, and sampling for multi-modal pairs (for example, text-image, text-video). Set `dataset.path` to your WebDataset location or shard pattern. See the Megatron-Energon documentation for format details and advanced options.
2727

2828
### Dataset Preparation Example
2929

@@ -98,13 +98,13 @@ Done
9898
9999
## Build Container
100100
101-
Please follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) section of the main README.
101+
Follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) section of the main README.
102102
103103
---
104104
105105
## Pretraining
106106
107-
Once you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.
107+
After you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.
108108
109109
For data loading, Energon provides two key hyperparameters related to sequence packing: `task_encoder_seq_length` and `packing_buffer_size`. The `task_encoder_seq_length` parameter controls the maximum sequence length passed to the model, while `packing_buffer_size` determines the number of samples processed to create different buckets. You can look at `select_samples_to_pack` and `pack_selected_samples` methods of [DiffusionTaskEncoderWithSequencePacking](https://github.com/NVIDIA-NeMo/DFM/blob/main/dfm/src/megatron/data/common/diffusion_task_encoder_with_sp.py#L50) to get a better sense of these parameters. For further details you can look at [Energon packing](https://nvidia.github.io/Megatron-Energon/advanced/packing.html) documentation.
110110
@@ -172,7 +172,7 @@ uv run --group megatron-bridge python -m torch.distributed.run \
172172
173173
## Inference
174174
175-
Once training completes, you can run inference using [inference_dit_model.py](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/inference_dit_model.py). The script requires your trained model checkpoint (`--checkpoint_path`) and a path to save generated videos (`--video_save_path`). You can pass two optional arguments, `--t5_cache_dir` and `--tokenizer_cache_dir`, to avoid re-downloading artifacts if they are already downloaded.
175+
After training completes, you can run inference using [inference_dit_model.py](https://github.com/NVIDIA-NeMo/DFM/blob/main/examples/megatron/recipes/dit/inference_dit_model.py). The script requires your trained model checkpoint (`--checkpoint_path`) and a path to save generated videos (`--video_save_path`). You can pass two optional arguments, `--t5_cache_dir` and `--tokenizer_cache_dir`, to avoid re-downloading artifacts if they are already downloaded.
176176
177177
```bash
178178
uv run --group megatron-bridge python -m torch.distributed.run --nproc-per-node $num_gpus \

0 commit comments

Comments
 (0)