Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a comprehensive diffusion framework under nemo_vfm, including VAE modules, sampling pipelines (RES, EDM, Cosmos), training entrypoints, and utility support for distributed setups.
- Added VAE components: perceptual+adversarial loss, encoder/decoder blocks, and autoencoder configuration.
- Implemented multiple sampler backends (RES, EDM, Flow Matching, Cosmos) with full pipeline integrations.
- Provided training scripts, parallel initialization utilities, and checkpoint conversion tools.
Reviewed Changes
Copilot reviewed 69 out of 467 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nemo_vfm/diffusion/vae/contperceptual_loss.py | New LPIPS+discriminator loss for VAE training |
| nemo_vfm/diffusion/vae/blocks.py | Core convolutional, ResNet, and attention building blocks |
| nemo_vfm/diffusion/vae/autovae.py | VAE architecture search/generation utilities |
| nemo_vfm/diffusion/vae/autoencoder.py | Full autoencoder implementation with config dataclass |
| nemo_vfm/diffusion/utils/mcore_parallel_utils.py | Megatron model parallel initialization helper |
| super().__init__() | ||
| assert disc_loss in ["hinge", "vanilla"] | ||
| self.kl_weight = kl_weight | ||
| self.pixel_weight = pixelloss_weight |
There was a problem hiding this comment.
The pixelloss_weight is stored in self.pixel_weight but never applied in forward. Consider multiplying the L1 reconstruction term by self.pixel_weight to respect the configured pixel loss weight.
| nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0] | ||
| g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0] | ||
| else: | ||
| nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0] |
There was a problem hiding this comment.
The code references self.last_layer, which is never defined. This will raise an AttributeError. You should either require last_layer to always be passed, or initialize self.last_layer appropriately.
| nn.Module: An instance of the requested attention block. | ||
| """ | ||
| assert attn_type in ["vanilla", "linear", "none"], f"attn_type {attn_type} unknown" | ||
| print(f"making attention of type '{attn_type}' with {in_channels} in_channels") |
There was a problem hiding this comment.
[nitpick] This debug print in make_attn will execute on every block creation and clutter logs. Consider removing it or replacing it with a logger at an appropriate level.
| print(f"making attention of type '{attn_type}' with {in_channels} in_channels") | |
| logging.debug(f"Making attention of type '{attn_type}' with {in_channels} in_channels") |
| search_space_skleton = self._load_base_json_skeleton() | ||
| search_space_skleton["down_block_types"] = choice["down_block_types"] | ||
| search_space_skleton["up_block_types"] = choice["up_block_types"] | ||
| search_space_skleton["block_out_channels"] = choice["block_out_channels"] | ||
| search_space_skleton["layers_per_block"] = choice["layers_per_block"] | ||
| search_space_skleton["latent_channels"] = choice["latent_channels"] | ||
| return search_space_skleton |
There was a problem hiding this comment.
Typo in variable name: search_space_skleton should likely be search_space_skeleton.
| search_space_skleton = self._load_base_json_skeleton() | |
| search_space_skleton["down_block_types"] = choice["down_block_types"] | |
| search_space_skleton["up_block_types"] = choice["up_block_types"] | |
| search_space_skleton["block_out_channels"] = choice["block_out_channels"] | |
| search_space_skleton["layers_per_block"] = choice["layers_per_block"] | |
| search_space_skleton["latent_channels"] = choice["latent_channels"] | |
| return search_space_skleton | |
| search_space_skeleton = self._load_base_json_skeleton() | |
| search_space_skeleton["down_block_types"] = choice["down_block_types"] | |
| search_space_skeleton["up_block_types"] = choice["up_block_types"] | |
| search_space_skeleton["block_out_channels"] = choice["block_out_channels"] | |
| search_space_skeleton["layers_per_block"] = choice["layers_per_block"] | |
| search_space_skeleton["latent_channels"] = choice["latent_channels"] | |
| return search_space_skeleton |
| state_dict = load_sft(ckpt_path) | ||
| missing, unexpected = self.load_state_dict(state_dict) | ||
| if len(missing) > 0: | ||
| logger.warning(f"Following keys are missing from checkpoint loaded: {missing}") |
There was a problem hiding this comment.
logger is not imported or defined in this module, causing a NameError. Import the standard Python logging module and configure a logger or use self.logger from nn.Module.
Signed-off-by: Ethan He <yihuih@nvidia.com>
Signed-off-by: Ethan He <yihuih@nvidia.com>
The sparse_attention directory was incorrectly configured as a submodule without a corresponding .gitmodules file, causing CI/CD checkout failures. This commit converts it to a regular directory with tracked files. Signed-off-by: Ethan He <yihuih@nvidia.com>
add diffusion;physical ai projects
No description provided.