-
Notifications
You must be signed in to change notification settings - Fork 2
add diffusion;physical ai projects #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive diffusion framework under nemo_vfm, including VAE modules, sampling pipelines (RES, EDM, Cosmos), training entrypoints, and utility support for distributed setups.
- Added VAE components: perceptual+adversarial loss, encoder/decoder blocks, and autoencoder configuration.
- Implemented multiple sampler backends (RES, EDM, Flow Matching, Cosmos) with full pipeline integrations.
- Provided training scripts, parallel initialization utilities, and checkpoint conversion tools.
Reviewed Changes
Copilot reviewed 69 out of 467 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nemo_vfm/diffusion/vae/contperceptual_loss.py | New LPIPS+discriminator loss for VAE training |
| nemo_vfm/diffusion/vae/blocks.py | Core convolutional, ResNet, and attention building blocks |
| nemo_vfm/diffusion/vae/autovae.py | VAE architecture search/generation utilities |
| nemo_vfm/diffusion/vae/autoencoder.py | Full autoencoder implementation with config dataclass |
| nemo_vfm/diffusion/utils/mcore_parallel_utils.py | Megatron model parallel initialization helper |
| super().__init__() | ||
| assert disc_loss in ["hinge", "vanilla"] | ||
| self.kl_weight = kl_weight | ||
| self.pixel_weight = pixelloss_weight |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pixelloss_weight is stored in self.pixel_weight but never applied in forward. Consider multiplying the L1 reconstruction term by self.pixel_weight to respect the configured pixel loss weight.
| nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0] | ||
| g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0] | ||
| else: | ||
| nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0] |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code references self.last_layer, which is never defined. This will raise an AttributeError. You should either require last_layer to always be passed, or initialize self.last_layer appropriately.
| nn.Module: An instance of the requested attention block. | ||
| """ | ||
| assert attn_type in ["vanilla", "linear", "none"], f"attn_type {attn_type} unknown" | ||
| print(f"making attention of type '{attn_type}' with {in_channels} in_channels") |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This debug print in make_attn will execute on every block creation and clutter logs. Consider removing it or replacing it with a logger at an appropriate level.
| print(f"making attention of type '{attn_type}' with {in_channels} in_channels") | |
| logging.debug(f"Making attention of type '{attn_type}' with {in_channels} in_channels") |
| search_space_skleton = self._load_base_json_skeleton() | ||
| search_space_skleton["down_block_types"] = choice["down_block_types"] | ||
| search_space_skleton["up_block_types"] = choice["up_block_types"] | ||
| search_space_skleton["block_out_channels"] = choice["block_out_channels"] | ||
| search_space_skleton["layers_per_block"] = choice["layers_per_block"] | ||
| search_space_skleton["latent_channels"] = choice["latent_channels"] | ||
| return search_space_skleton |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in variable name: search_space_skleton should likely be search_space_skeleton.
| search_space_skleton = self._load_base_json_skeleton() | |
| search_space_skleton["down_block_types"] = choice["down_block_types"] | |
| search_space_skleton["up_block_types"] = choice["up_block_types"] | |
| search_space_skleton["block_out_channels"] = choice["block_out_channels"] | |
| search_space_skleton["layers_per_block"] = choice["layers_per_block"] | |
| search_space_skleton["latent_channels"] = choice["latent_channels"] | |
| return search_space_skleton | |
| search_space_skeleton = self._load_base_json_skeleton() | |
| search_space_skeleton["down_block_types"] = choice["down_block_types"] | |
| search_space_skeleton["up_block_types"] = choice["up_block_types"] | |
| search_space_skeleton["block_out_channels"] = choice["block_out_channels"] | |
| search_space_skeleton["layers_per_block"] = choice["layers_per_block"] | |
| search_space_skeleton["latent_channels"] = choice["latent_channels"] | |
| return search_space_skeleton |
| state_dict = load_sft(ckpt_path) | ||
| missing, unexpected = self.load_state_dict(state_dict) | ||
| if len(missing) > 0: | ||
| logger.warning(f"Following keys are missing from checkpoint loaded: {missing}") |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger is not imported or defined in this module, causing a NameError. Import the standard Python logging module and configure a logger or use self.logger from nn.Module.
Signed-off-by: Ethan He <[email protected]>
Signed-off-by: Ethan He <[email protected]>
The sparse_attention directory was incorrectly configured as a submodule without a corresponding .gitmodules file, causing CI/CD checkout failures. This commit converts it to a regular directory with tracked files. Signed-off-by: Ethan He <[email protected]>
Signed-off-by: Ethan He <[email protected]>
No description provided.