-
Notifications
You must be signed in to change notification settings - Fork 2
DFM Performance Improvements #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Added comments to clarify file purposes in example_commands.sh, inference_wan.py, pretrain_wan.py, wan_provider.py, wan_step.py, and wan.py. - Introduced EnergonMultiModalDataModule for handling multimodal datasets in nemo_vfm. - Created SequentialMegatronSampler for efficient sequential sampling in large datasets. - Added new files for DIT attention and base data handling. This commit enhances documentation and introduces new functionalities for better data management and processing.
Contributor
Author
|
/ok to test 1b055c6 |
Signed-off-by: Parth Mannan <[email protected]>
Contributor
Author
|
/ok to test 04f6c14 |
Signed-off-by: Parth Mannan <[email protected]>
Contributor
Author
|
/ok to test 7247bc7 |
Signed-off-by: Parth Mannan <[email protected]>
Signed-off-by: Parth Mannan <[email protected]>
Contributor
Author
|
/ok to test 7fedda7 |
abhinavg4
approved these changes
Nov 21, 2025
Contributor
abhinavg4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks a ton for your help.
lbliii
pushed a commit
that referenced
this pull request
Dec 3, 2025
* first commit * workable code * workable thd * clean up, remove all CP for sbhd, CP now is only for thd * run outside of Mbridge * Update example scripts and add new data module for multimodal datasets - Added comments to clarify file purposes in example_commands.sh, inference_wan.py, pretrain_wan.py, wan_provider.py, wan_step.py, and wan.py. - Introduced EnergonMultiModalDataModule for handling multimodal datasets in nemo_vfm. - Created SequentialMegatronSampler for efficient sequential sampling in large datasets. - Added new files for DIT attention and base data handling. This commit enhances documentation and introduces new functionalities for better data management and processing. * workable code before refactoring * refactor attention submodules + reorder files locations * update refactor * update refactor * reorganize files * reorganize files * refactoring code * add README for perf test * using vae, t5, scheduler from Diffusers * update repo, remove Wan's Github moduels * fix Ruff * fix ruff + copyright * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * merged main + address comments * remove example_commands.md, Google waits until mid Nov * refactor inference_configs + mockdatamodule * add dit_embeddings.py * fix lint ruff * add 'average_gradients_across_tp_domain' to torch.nn for when running sequence_parallelism * add english negative prompt * fix ruff lint * Update uv.lock for deps: diffusers==0.35.1, easydict, imageio * update dfm/src/megatron/data/dit * change english negative prompt * seem to workable seq_packing * refactor with Sajad's PR - DiT data to common dir * fix Ruff, lint * fix Ruff, lint * fix Ruff, lint * workable mock datamodule (doesn't need setting path); updated training algo + hyper-parameters aligning with Linnan; tested training with anime dataset finetung * bring wan_task encoders features to common, sharing with dit * lint, ruff * lint, ruff * lint, ruff * fix CP error (input of thd_split_inputs_cp to be cu_seqlens_q_padded instead of cu_seqlens_q) * udpate README_perf_test.md * fix lint, ruff * update uv.lock, merge main * uv.lock * uv.lock * uv.lock * update uv.lock [using ci] * Performance improvements to Wan * Perf optimizations * Tiny fix * Remove CP disable as packed sequences not supported * Fix comment * Minor fixes. Revert video_latent comparison * Fix missed check * Lint fix * H100 mock pretraining perf config * Rename config file * Lint check Signed-off-by: Parth Mannan <[email protected]> * Adding GB200 perf config Signed-off-by: Parth Mannan <[email protected]> * GB300 perf config Signed-off-by: Parth Mannan <[email protected]> * Refactor Energon data module to return wrapped dataloaders and add EnergonDataloader class for cyclic iteration. Introduce WAN pretrain mock data configuration for testing. * Enhance DiffusionTaskEncoder to handle None attributes in stacking and concatenation methods. Add WAN pretrain mock data configuration for testing purposes. * Refactor data processing in dit_data_step to simplify batch retrieval and update WAN pretrain configuration to include train_iters. * Add op fusions Signed-off-by: Parth Mannan <[email protected]> * Update H100 config Signed-off-by: Parth Mannan <[email protected]> * Fix lint Signed-off-by: Parth Mannan <[email protected]> * Resolve conflict Signed-off-by: Parth Mannan <[email protected]> * Fix for mock dataloader test Signed-off-by: Parth Mannan <[email protected]> * Fix Dummyiter Signed-off-by: Parth Mannan <[email protected]> * Fix test Signed-off-by: Parth Mannan <[email protected]> * Make RoPE test only GPU Signed-off-by: Parth Mannan <[email protected]> * Rope cuda fix Signed-off-by: Parth Mannan <[email protected]> --------- Signed-off-by: Parth Mannan <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: Abhinav Garg <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Lawrence Lane <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.