-
Notifications
You must be signed in to change notification settings - Fork 2
Add DiT Readme. #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DiT Readme. #61
Conversation
sajadn
commented
Nov 18, 2025
- add dit readme.
Signed-off-by: sajadn <[email protected]>
abhinavg4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of comments also please check these comments: https://github.com/NVIDIA-NeMo/DFM/pull/38/files
Signed-off-by: Sajad Norouzi <[email protected]>
Signed-off-by: Sajad Norouzi <[email protected]>
abhinavg4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation. Loved it. Left a few minor comments.
| The following repositories need to be cloned with specific commit hashes: | ||
|
|
||
| #### Megatron-LM | ||
| The script below prepares the dataset to be compatible with Energon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: packs the HF dataset into Webdataset format. Format compatible with Energon sounds something mystic
| git checkout dit_debug | ||
| cd .. | ||
| energon prepare ./ | ||
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this print. Not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want me to get rid of the whole prints of energon prepare ./?
| Once you have the dataset and container ready, you can start training the DiT model on your own dataset. This repository leverages [sequence packing](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/features/optimizations/sequence_packing.html) to maximize training efficiency. Sequence packing stacks multiple samples into a single sequence instead of padding individual samples to a fixed length; therefore, `micro_batch_size` must be set to 1. Additionally, `qkv_format` should be set to `thd` to signal to Transformer Engine that sequence packing is enabled. | ||
| For data loading, Energon provides two key hyperparameters related to sequence packing: `task_encoder_seq_length` and `packing_buffer_size`. The `task_encoder_seq_length` parameter controls the maximum sequence length passed to the model, while `packing_buffer_size` determines the number of samples processed to create different buckets. You can look at `select_samples_to_pack` and `pack_selected_samples` methods of [DiffusionTaskEncoderWithSequencePacking](https://github.com/NVIDIA-NeMo/DFM/blob/main/dfm/src/megatron/data/common/diffusion_task_encoder_with_sp.py#L50) to get a better sense of these parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great section. Also provide link to energon documentation for these params?
Signed-off-by: Sajad Norouzi <[email protected]>
Signed-off-by: Sajad Norouzi <[email protected]>
|
/ok to test 045f421 |
| @@ -1,77 +1,184 @@ | |||
| # DiT (Diffusion Transformer) Model Setup | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this under docs/megatron/models/DiT/dit.md
Signed-off-by: Sajad Norouzi <[email protected]>
abhinavg4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
|
/ok to test 7a832f6 |
* Add DiT Readme. Signed-off-by: sajadn <[email protected]> * Update DiT readme. Signed-off-by: Sajad Norouzi <[email protected]> * Minor wording update. Signed-off-by: Sajad Norouzi <[email protected]> --------- Signed-off-by: sajadn <[email protected]> Signed-off-by: Sajad Norouzi <[email protected]> Signed-off-by: Lawrence Lane <[email protected]>