diff --git a/docs/automodel/automodel_training_doc.md b/docs/automodel/automodel_training_doc.md new file mode 100644 index 00000000..ae5f6d9c --- /dev/null +++ b/docs/automodel/automodel_training_doc.md @@ -0,0 +1,288 @@ +# Diffusion Model Fine-tuning with Automodel Backend + +Train diffusion models with distributed training support using NeMo Automodel and flow matching. + +**Currently Supported:** Wan 2.1 Text-to-Video (1.3B and 14B models) + +--- + +## Quick Start + +### 1. Docker Setup + +```bash +# Build image +docker build -f docker/Dockerfile.ci -t dfm-training . + +# Run container +docker run --gpus all -it \ + -v $(pwd):/workspace \ + -v /path/to/data:/data \ + --ipc=host \ + --ulimit memlock=-1 \ + --ulimit stack=67108864 \ + dfm-training bash + +# Inside container: Initialize submodules +export UV_PROJECT_ENVIRONMENT= +git submodule update --init --recursive 3rdparty/ +``` + +### 2. Prepare Data + +We provide two ways to prepare your dataset: + +- Start with raw videos: Place your `.mp4` files in a folder and use our data-preparation scripts to scan the videos and generate a `meta.json` entry for each sample (which includes `width`, `height`, `start_frame`, `end_frame`, and a caption). If you have captions, you can also include per-video named `