Supporting Wan model by huvunvidia · Pull Request #21 · NVIDIA-NeMo/DFM

huvunvidia · 2025-10-30T14:33:05Z

No description provided.

copy-pr-bot · 2025-10-30T14:33:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Added comments to clarify file purposes in example_commands.sh, inference_wan.py, pretrain_wan.py, wan_provider.py, wan_step.py, and wan.py. - Introduced EnergonMultiModalDataModule for handling multimodal datasets in nemo_vfm. - Created SequentialMegatronSampler for efficient sequential sampling in large datasets. - Added new files for DIT attention and base data handling. This commit enhances documentation and introduces new functionalities for better data management and processing.

abhinavg4

Thanks a lot for the PR. I think a bunch of changes need to be done. Especially please remove all the debugging stuff and add copyright headers. Thanks a lot.

examples/megatron/recipes/wan/example_commands.md

dfm/src/megatron/data/Dit/prepare_energon_dataset.py

abhinavg4 · 2025-11-04T16:58:02Z

dfm/src/megatron/data/dit/utils.py

+    return cropped_tensor
+
+
+def test_no_cropping_needed():


Can you remove these tests and move them to the tests folder?

I would refrain from modifying DiT's related data code.
I would leave that for Sajad, so he can have an overall view of what is edited for DiT.
This file is in this PR only because diffusion_energon_datamodule.py is needed for Wan data energon.

Sure, @sajadn Can you please remove this?

hey, DiT does not use minimal_crop, so this file is not present in my final branch. I think this comment has to be addressed on this branch.

examples/megatron/recipes/wan/prepare_energon_dataset_wan.py

dfm/src/megatron/model/wan/flow_matching/flow_pipeline.py

dfm/src/megatron/model/wan/inference/configs/shared_config.py

dfm/src/megatron/model/wan/inference/configs/wan_i2v_14B.py

dfm/src/megatron/recipes/wan/wan.py

huvunvidia · 2025-11-12T18:16:24Z

/ok to test 13968fc

pablo-garay · 2025-11-12T18:58:59Z

pyproject.toml

    "Topic :: Utilities",
 ]
 dependencies = [
+    "diffusers==0.35.1",


These (new lines for) dependencies were already added below in this pyproject.toml file, hence i think we dont need to add them here anymore

abhinavg4

Looks good. Thanks

huvunvidia · 2025-11-12T19:37:26Z

/ok to test b1c41fc

pablo-garay · 2025-11-12T22:20:03Z

/ok to test d8bcade

pablo-garay · 2025-11-12T22:33:26Z

/ok to test 681145b

* first commit * workable code * workable thd * clean up, remove all CP for sbhd, CP now is only for thd * run outside of Mbridge * Update example scripts and add new data module for multimodal datasets - Added comments to clarify file purposes in example_commands.sh, inference_wan.py, pretrain_wan.py, wan_provider.py, wan_step.py, and wan.py. - Introduced EnergonMultiModalDataModule for handling multimodal datasets in nemo_vfm. - Created SequentialMegatronSampler for efficient sequential sampling in large datasets. - Added new files for DIT attention and base data handling. This commit enhances documentation and introduces new functionalities for better data management and processing. * workable code before refactoring * refactor attention submodules + reorder files locations * update refactor * update refactor * reorganize files * reorganize files * refactoring code * add README for perf test * using vae, t5, scheduler from Diffusers * update repo, remove Wan's Github moduels * fix Ruff * fix ruff + copyright * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * fix Ruff + Lint * merged main + address comments * remove example_commands.md, Google waits until mid Nov * refactor inference_configs + mockdatamodule * add dit_embeddings.py * fix lint ruff * add 'average_gradients_across_tp_domain' to torch.nn for when running sequence_parallelism * add english negative prompt * fix ruff lint * Update uv.lock for deps: diffusers==0.35.1, easydict, imageio * update dfm/src/megatron/data/dit * change english negative prompt * seem to workable seq_packing * refactor with Sajad's PR - DiT data to common dir * fix Ruff, lint * fix Ruff, lint * fix Ruff, lint * workable mock datamodule (doesn't need setting path); updated training algo + hyper-parameters aligning with Linnan; tested training with anime dataset finetung * bring wan_task encoders features to common, sharing with dit * lint, ruff * lint, ruff * lint, ruff * fix CP error (input of thd_split_inputs_cp to be cu_seqlens_q_padded instead of cu_seqlens_q) * udpate README_perf_test.md * fix lint, ruff * update uv.lock, merge main * uv.lock * uv.lock * uv.lock * update uv.lock [using ci] --------- Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com> Co-authored-by: Abhinav Garg <abhinavg@stanford.edu> Co-authored-by: root <root@eos0025.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0558.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com>

first commit

75ed5b2

Huy Vu2 and others added 12 commits October 30, 2025 13:28

workable code

2ebfd50

workable thd

7b834f0

clean up, remove all CP for sbhd, CP now is only for thd

2152abd

run outside of Mbridge

389a037

workable code before refactoring

d5d0106

Merge remote-tracking branch 'origin/huvu/mcore_wan' into huvu/mcore_wan

c4f5160

refactor attention submodules + reorder files locations

0430384

update refactor

dfff86b

update refactor

abbaa2a

reorganize files

c59f6a2

reorganize files

0b91a1c

abhinavg4 requested changes Nov 4, 2025

View reviewed changes

refactoring code

aa20504

huvunvidia requested a review from a team as a code owner November 5, 2025 07:14

Huy Vu2 added 13 commits November 4, 2025 23:52

add README for perf test

d5f58c9

using vae, t5, scheduler from Diffusers

9b8e4fb

update repo, remove Wan's Github moduels

7f414ae

Merge remote-tracking branch 'origin/main' into huvu/mcore_wan

62a518f

fix Ruff

2de5934

fix ruff + copyright

6b46a7f

fix Ruff + Lint

c1d8923

fix Ruff + Lint

e8de1ae

fix Ruff + Lint

287ad34

fix Ruff + Lint

4464fd2

fix Ruff + Lint

547339a

fix Ruff + Lint

9cd082b

fix Ruff + Lint

4514eee

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 16:37 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 16:59 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 12, 2025 17:19 Error

Huy Vu2 added 2 commits November 12, 2025 10:08

update uv.lock, merge main

0b0058f

update uv.lock, merge main

13968fc

copy-pr-bot bot temporarily deployed to test November 12, 2025 18:17 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 12, 2025 18:17 Failure

pablo-garay reviewed Nov 12, 2025

View reviewed changes

Huy Vu2 added 3 commits November 12, 2025 11:13

uv.lock

46aa6d8

uv.lock

6b553ec

uv.lock

b1c41fc

abhinavg4 previously approved these changes Nov 12, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to test November 12, 2025 19:37 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 12, 2025 19:37 Failure

github-actions bot dismissed abhinavg4’s stale review via d8bcade November 12, 2025 21:48

update uv.lock [using ci]

681145b

pablo-garay force-pushed the huvu/mcore_wan branch from d8bcade to 681145b Compare November 12, 2025 22:27

copy-pr-bot bot temporarily deployed to test November 12, 2025 22:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 22:33 Inactive

pablo-garay approved these changes Nov 12, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 23:07 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 23:30 Inactive

huvunvidia merged commit ddd4fe8 into main Nov 13, 2025
15 checks passed

Conversation

huvunvidia commented Oct 30, 2025

Uh oh!

copy-pr-bot bot commented Oct 30, 2025

Uh oh!

abhinavg4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhinavg4 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

huvunvidia Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

sajadn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huvunvidia commented Nov 12, 2025

Uh oh!

pablo-garay Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavg4 left a comment

Choose a reason for hiding this comment

Uh oh!

huvunvidia commented Nov 12, 2025

Uh oh!

pablo-garay commented Nov 12, 2025

Uh oh!

pablo-garay commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants