Add DiT Readme. #61

sajadn · 2025-11-18T22:04:21Z

add dit readme.

Signed-off-by: sajadn <[email protected]>

copy-pr-bot · 2025-11-18T22:04:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

abhinavg4

Left a bunch of comments also please check these comments: https://github.com/NVIDIA-NeMo/DFM/pull/38/files

examples/megatron/recipes/dit/README.md

Signed-off-by: Sajad Norouzi <[email protected]>

abhinavg4

Great documentation. Loved it. Left a few minor comments.

abhinavg4 · 2025-11-21T03:27:01Z

examples/megatron/recipes/dit/README.md

-The following repositories need to be cloned with specific commit hashes:
-
-#### Megatron-LM
+The script below prepares the dataset to be compatible with Energon.


Nit: packs the HF dataset into Webdataset format. Format compatible with Energon sounds something mystic

abhinavg4 · 2025-11-21T03:28:05Z

examples/megatron/recipes/dit/README.md

-git checkout dit_debug
-cd ..
+energon prepare ./
+/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.


Remove this print. Not needed

do you want me to get rid of the whole prints of energon prepare ./?

abhinavg4 · 2025-11-21T03:29:26Z

examples/megatron/recipes/dit/README.md

+
+Once you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.
+
+For data loading, Energon provides two key hyperparameters related to sequence packing: `task_encoder_seq_length` and `packing_buffer_size`. The `task_encoder_seq_length` parameter controls the maximum sequence length passed to the model, while `packing_buffer_size` determines the number of samples processed to create different buckets. You can look at `select_samples_to_pack` and `pack_selected_samples` methods of [DiffusionTaskEncoderWithSequencePacking](https://github.com/NVIDIA-NeMo/DFM/blob/main/dfm/src/megatron/data/common/diffusion_task_encoder_with_sp.py#L50) to get a better sense of these parameters.


Great section. Also provide link to energon documentation for these params?

Signed-off-by: Sajad Norouzi <[email protected]>

abhinavg4 · 2025-11-21T18:29:17Z

/ok to test 045f421

abhinavg4 · 2025-11-21T18:31:13Z

examples/megatron/recipes/dit/README.md

@@ -1,77 +1,184 @@
-# DiT (Diffusion Transformer) Model Setup


Please put this under docs/megatron/models/DiT/dit.md

Signed-off-by: Sajad Norouzi <[email protected]>

abhinavg4

Looks good

abhinavg4 · 2025-12-01T12:29:45Z

/ok to test 7a832f6

* Add DiT Readme. Signed-off-by: sajadn <[email protected]> * Update DiT readme. Signed-off-by: Sajad Norouzi <[email protected]> * Minor wording update. Signed-off-by: Sajad Norouzi <[email protected]> --------- Signed-off-by: sajadn <[email protected]> Signed-off-by: Sajad Norouzi <[email protected]> Signed-off-by: Lawrence Lane <[email protected]>

Add DiT Readme.

b1efcda

Signed-off-by: sajadn <[email protected]>

abhinavg4 requested changes Nov 19, 2025

View reviewed changes

sajadn added 2 commits November 20, 2025 15:16

Update DiT readme.

7bd9479

Signed-off-by: Sajad Norouzi <[email protected]>

Minor wording update.

d2659ba

Signed-off-by: Sajad Norouzi <[email protected]>

sajadn requested a review from abhinavg4 November 20, 2025 23:21

abhinavg4 previously approved these changes Nov 21, 2025

View reviewed changes

sajadn enabled auto-merge (squash) November 21, 2025 15:57

minor wording update.

b2a8972

Signed-off-by: Sajad Norouzi <[email protected]>

sajadn dismissed abhinavg4’s stale review via b2a8972 November 21, 2025 16:08

sajadn and others added 2 commits November 21, 2025 09:17

delete warning.

633dc2e

Signed-off-by: Sajad Norouzi <[email protected]>

Merge branch 'main' into dit_readme

045f421

copy-pr-bot bot temporarily deployed to test November 21, 2025 18:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 21, 2025 18:29 Inactive

abhinavg4 reviewed Nov 21, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci November 21, 2025 18:40 Inactive

move to docs folder.

49fa7a1

Signed-off-by: Sajad Norouzi <[email protected]>

abhinavg4 approved these changes Dec 1, 2025

View reviewed changes

Merge branch 'main' into dit_readme

7a832f6

copy-pr-bot bot temporarily deployed to test December 1, 2025 12:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 1, 2025 12:30 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 1, 2025 13:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 1, 2025 13:49 Inactive

sajadn merged commit 2489a8e into main Dec 1, 2025
15 checks passed


		Once you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled.

		For data loading, Energon provides two key hyperparameters related to sequence packing: `task_encoder_seq_length` and `packing_buffer_size`. The `task_encoder_seq_length` parameter controls the maximum sequence length passed to the model, while `packing_buffer_size` determines the number of samples processed to create different buckets. You can look at `select_samples_to_pack` and `pack_selected_samples` methods of [DiffusionTaskEncoderWithSequencePacking](https://github.com/NVIDIA-NeMo/DFM/blob/main/dfm/src/megatron/data/common/diffusion_task_encoder_with_sp.py#L50) to get a better sense of these parameters.

Add DiT Readme. #61

Add DiT Readme. #61

Uh oh!

Conversation

sajadn commented Nov 18, 2025

Uh oh!

copy-pr-bot bot commented Nov 18, 2025

Uh oh!

abhinavg4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhinavg4 left a comment

Choose a reason for hiding this comment

Uh oh!

abhinavg4 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

sajadn Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 commented Nov 21, 2025

Uh oh!

abhinavg4 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 left a comment

Choose a reason for hiding this comment

Uh oh!

abhinavg4 commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants