add diffusion;physical ai projects #4

ethanhe42 · 2025-07-11T19:08:08Z

No description provided.

copy-pr-bot · 2025-07-11T19:08:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull Request Overview

This PR introduces a comprehensive diffusion framework under nemo_vfm, including VAE modules, sampling pipelines (RES, EDM, Cosmos), training entrypoints, and utility support for distributed setups.

Added VAE components: perceptual+adversarial loss, encoder/decoder blocks, and autoencoder configuration.
Implemented multiple sampler backends (RES, EDM, Flow Matching, Cosmos) with full pipeline integrations.
Provided training scripts, parallel initialization utilities, and checkpoint conversion tools.

Reviewed Changes

Copilot reviewed 69 out of 467 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
nemo_vfm/diffusion/vae/contperceptual_loss.py	New LPIPS+discriminator loss for VAE training
nemo_vfm/diffusion/vae/blocks.py	Core convolutional, ResNet, and attention building blocks
nemo_vfm/diffusion/vae/autovae.py	VAE architecture search/generation utilities
nemo_vfm/diffusion/vae/autoencoder.py	Full autoencoder implementation with config dataclass
nemo_vfm/diffusion/utils/mcore_parallel_utils.py	Megatron model parallel initialization helper

Copilot · 2025-07-11T19:29:53Z

nemo_vfm/diffusion/vae/contperceptual_loss.py

+        super().__init__()
+        assert disc_loss in ["hinge", "vanilla"]
+        self.kl_weight = kl_weight
+        self.pixel_weight = pixelloss_weight


The pixelloss_weight is stored in self.pixel_weight but never applied in forward. Consider multiplying the L1 reconstruction term by self.pixel_weight to respect the configured pixel loss weight.

Copilot · 2025-07-11T19:29:54Z

nemo_vfm/diffusion/vae/contperceptual_loss.py

+            nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]
+            g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]
+        else:
+            nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0]


The code references self.last_layer, which is never defined. This will raise an AttributeError. You should either require last_layer to always be passed, or initialize self.last_layer appropriately.

Copilot · 2025-07-11T19:29:54Z

nemo_vfm/diffusion/vae/blocks.py

+        nn.Module: An instance of the requested attention block.
+    """
+    assert attn_type in ["vanilla", "linear", "none"], f"attn_type {attn_type} unknown"
+    print(f"making attention of type '{attn_type}' with {in_channels} in_channels")


[nitpick] This debug print in make_attn will execute on every block creation and clutter logs. Consider removing it or replacing it with a logger at an appropriate level.

Suggested change

print(f"making attention of type '{attn_type}' with {in_channels} in_channels")

logging.debug(f"Making attention of type '{attn_type}' with {in_channels} in_channels")

Copilot · 2025-07-11T19:29:54Z

nemo_vfm/diffusion/vae/autovae.py

+        search_space_skleton = self._load_base_json_skeleton()
+        search_space_skleton["down_block_types"] = choice["down_block_types"]
+        search_space_skleton["up_block_types"] = choice["up_block_types"]
+        search_space_skleton["block_out_channels"] = choice["block_out_channels"]
+        search_space_skleton["layers_per_block"] = choice["layers_per_block"]
+        search_space_skleton["latent_channels"] = choice["latent_channels"]
+        return search_space_skleton


Typo in variable name: search_space_skleton should likely be search_space_skeleton.

Suggested change

search_space_skleton = self._load_base_json_skeleton()

search_space_skleton["down_block_types"] = choice["down_block_types"]

search_space_skleton["up_block_types"] = choice["up_block_types"]

search_space_skleton["block_out_channels"] = choice["block_out_channels"]

search_space_skleton["layers_per_block"] = choice["layers_per_block"]

search_space_skleton["latent_channels"] = choice["latent_channels"]

return search_space_skleton

search_space_skeleton = self._load_base_json_skeleton()

search_space_skeleton["down_block_types"] = choice["down_block_types"]

search_space_skeleton["up_block_types"] = choice["up_block_types"]

search_space_skeleton["block_out_channels"] = choice["block_out_channels"]

search_space_skeleton["layers_per_block"] = choice["layers_per_block"]

search_space_skeleton["latent_channels"] = choice["latent_channels"]

return search_space_skeleton

Copilot · 2025-07-11T19:29:54Z

nemo_vfm/diffusion/vae/autoencoder.py

+        state_dict = load_sft(ckpt_path)
+        missing, unexpected = self.load_state_dict(state_dict)
+        if len(missing) > 0:
+            logger.warning(f"Following keys are missing from checkpoint loaded: {missing}")


logger is not imported or defined in this module, causing a NameError. Import the standard Python logging module and configure a logger or use self.logger from nn.Module.

Signed-off-by: Ethan He <[email protected]>

The sparse_attention directory was incorrectly configured as a submodule without a corresponding .gitmodules file, causing CI/CD checkout failures. This commit converts it to a regular directory with tracked files. Signed-off-by: Ethan He <[email protected]>

Signed-off-by: Ethan He <[email protected]>

add diffusion;physical ai projects

8e91d5d

check

a775282

ethanhe42 requested a review from a team as a code owner July 11, 2025 19:26

ethanhe42 requested a review from Copilot July 11, 2025 19:27

Copilot AI reviewed Jul 11, 2025

View reviewed changes

Ethan He added 2 commits July 11, 2025 13:19

sparse attention

01e1029

Signed-off-by: Ethan He <[email protected]>

add readme

ea4430c

Signed-off-by: Ethan He <[email protected]>

pablo-garay previously approved these changes Jul 11, 2025

View reviewed changes

Ethan He added 2 commits July 11, 2025 13:56

pre-commit

d354d14

ethanhe42 dismissed pablo-garay’s stale review via d354d14 July 11, 2025 20:57

cp

c7c0036

Signed-off-by: Ethan He <[email protected]>

pablo-garay approved these changes Jul 11, 2025

View reviewed changes

ethanhe42 merged commit d6631a9 into main Jul 11, 2025
3 checks passed

ethanhe42 deleted the ethanhe42/projects branch July 11, 2025 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add diffusion;physical ai projects #4

add diffusion;physical ai projects #4

Uh oh!

ethanhe42 commented Jul 11, 2025

Uh oh!

copy-pr-bot bot commented Jul 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	print(f"making attention of type '{attn_type}' with {in_channels} in_channels")
	logging.debug(f"Making attention of type '{attn_type}' with {in_channels} in_channels")

add diffusion;physical ai projects #4

add diffusion;physical ai projects #4

Uh oh!

Conversation

ethanhe42 commented Jul 11, 2025

Uh oh!

copy-pr-bot bot commented Jul 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants