Skip to content
Open
6 changes: 3 additions & 3 deletions docs/megatron/models/dit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Done

### 🐳 Build Container

Please follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#-built-your-own-container) section of the main README.
Please follow the instructions in the [container](https://github.com/NVIDIA-NeMo/DFM#build-your-own-container) section of the main README.

---

Expand All @@ -106,12 +106,12 @@ The model architecture can be customized through parameters such as `num_layers`
First, copy the example config file and update it with your own settings:

```bash
cp examples/megatron/recipes/dit/conf/dit_pretrain_example.yaml examples/megatron/recipes/dit/conf/my_config.yaml
cp examples/megatron/recipes/dit/conf/dit_pretrain.yaml examples/megatron/recipes/dit/conf/my_config.yaml
# Edit my_config.yaml to set:
# - model.vae_cache_folder: Path to VAE cache folder
# - dataset.path: Path to your dataset folder
# - checkpoint.save and checkpoint.load: Path to checkpoint folder
# - train.global_batch_size: Set to match be divisible by NUM_GPUs
# - train.global_batch_size: Set to be divisible by NUM_GPUs
# - logger.wandb_exp_name: Your experiment name
```

Expand Down
19 changes: 17 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# Examples
# DFM Examples

Training and inference examples for dtensor and megatron models
Collection of examples and recipes for training and inference using the Data Foundation Model (DFM) framework.

## Quick Start

If you are new to DFM, start with the **[Automodel Examples](automodel/)** for high-level API usage.

## Examples by Category

| Category | Description | Key Examples |
|----------|-------------|--------------|
| **[Automodel](automodel/)** | High-level API for seamless training and inference | [Wan 2.1 Fine-tuning](automodel/README.md), [Pretraining](automodel/pretrain/), [Generation](automodel/generate/) |
| **[Megatron](megatron/)** | Advanced recipes and configurations using Megatron-Core | [DiT Recipes](megatron/recipes/dit/), [Wan Recipes](megatron/recipes/wan/) |

## Support

For issues or questions, please open a GitHub issue or refer to the main documentation.
14 changes: 14 additions & 0 deletions examples/automodel/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Automodel Examples

High-level API examples for training, fine-tuning, and generating with DFM models.

## Supported Tasks

| Task | Directory | Available Examples |
|------|-----------|-------------------|
| **Fine-tuning** | **[finetune](finetune/)** | • **[Wan 2.1 T2V](finetune/finetune.py)**: Fine-tuning with Flow Matching <br> • **[Multi-node](finetune/wan2_1_t2v_flow_multinode.yaml)**: Distributed training config |
| **Generation** | **[generate](generate/)** | • **[Generate](generate/wan_generate.py)**: Run inference with Wan 2.1 <br> • **[Validate](generate/wan_validate.py)**: Run validation loop |
| **Pre-training** | **[pretrain](pretrain/)** | • **[Wan 2.1 T2V](pretrain/pretrain.py)**: Pre-training from scratch |

---

# Diffusion Model Fine-tuning with Automodel Backend

Train diffusion models with distributed training support using NeMo Automodel and flow matching.
Expand Down
16 changes: 15 additions & 1 deletion examples/megatron/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Megatron Examples

Recipes and configuration overrides for megatron training.
Advanced recipes and configuration overrides for training models using the Megatron-Core backend.

## Available Model Recipes

| Recipe | Key Scripts | Description |
|--------|-------------|-------------|
| **[DiT](recipes/dit/README.md)** | • [Pretrain](recipes/dit/pretrain_dit_model.py) <br> • [Inference](recipes/dit/inference_dit_model.py) | Diffusion Transformer (DiT) training on butterfly dataset |
| **[Wan](recipes/wan/README.md)** | • [Pretrain](recipes/wan/pretrain_wan.py) <br> • [Inference](recipes/wan/inference_wan.py) | Wan 2.1 model pre-training and inference |

## Directory Structure

| Directory | Description |
|-----------|-------------|
| **[recipes](recipes/)** | Source code and scripts for the models above |
| **[override_configs](override_configs/)** | Configuration overrides for customizing parallelism (TP/CP/SP) |
14 changes: 12 additions & 2 deletions examples/megatron/override_configs/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# Override Configs
# Configuration Overrides

Parallelism configuration overrides for different CP/TP/SP sizes.
Collection of YAML configuration files used to override default settings in Megatron training recipes. These are typically used for specifying parallelization strategies (Tensor Parallelism, Context Parallelism, Sequence Parallelism) or data configurations.

## Files

| File | Description |
|------|-------------|
| `wan_pretrain_sample_data.yaml` | Sample data configuration for Wan pre-training. |

## Usage

These configs can be passed to the training script arguments to override defaults.
11 changes: 9 additions & 2 deletions examples/megatron/recipes/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# Recipe
# Model Recipes

Training recipes for Wan2.1 pretraining, finetuning, and weight verification.
Collection of end-to-end training recipes for specific model architectures.

## Available Recipes

| Recipe | Description |
|--------|-------------|
| **[DiT](dit/)** | Diffusion Transformer (DiT) training on butterfly dataset |
| **[Wan](wan/)** | Wan 2.1 model pre-training and inference |
13 changes: 13 additions & 0 deletions examples/megatron/recipes/wan/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Wan Recipes

Recipes for training and inferencing Wan models using Megatron-Core.

## Files

- `pretrain_wan.py`: Main pre-training script.
- `inference_wan.py`: Inference script.
- `prepare_energon_dataset_wan.py`: Dataset preparation utility.

## Performance Testing

See **[Performance Test Guide](README_perf_test.md)** for details on running performance benchmarks on different hardware (H100, GB200, etc.).