Skip to content

Conversation

@dran-dev
Copy link
Contributor

@dran-dev dran-dev commented Nov 26, 2025

PhysicsNeMo Pull Request

Description

This PR improves the configuration flexibility for DLWP Healpix and GraphCast training recipes. It allows for separation of checkpoint storage from other output artifacts and standardizes configuration management.

Changes

  • DLWP Healpix: Implemented configurable checkpoint_dir across configs, trainer, and utility scripts. fallback to default behavior ({output_dir}/tensorboard/checkpoints) if unspecified.
  • GraphCast: Parameterized Weights & Biases entity and project settings in Hydra config with backward-compatible defaults.
  • General: Updated CHANGELOG.md.

Testing

  • All commits pass pre-commit hooks (format, lint, interrogate, license).
  • Commits are signed-off (DCO) and GPG signed.
  • Verified backward compatibility:
    • DLWP Healpix defaults to {output_dir}/tensorboard/checkpoints if checkpoint_dir is missing.
    • GraphCast defaults to original W&B project/entity if config keys are missing.
  • Verified that checkpoint_dir logic correctly resolves paths in train.py and trainer.py.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…xamples

- Added configurable checkpoint directory to DLWP Healpix config and training script.
- Implemented Trainer logic to use specific checkpoint directory.
- Updated utils.py to respect exact checkpoint path.
- Made Weights & Biases entity and project configurable in GraphCast example.
@dran-dev dran-dev force-pushed the feat/weather-config-enhancements branch from dcadcc6 to fb82a44 Compare November 26, 2025 06:30
@dran-dev dran-dev changed the title Enhance configuration for DLWP Healpix and GraphCast examples Enhance checkpoint configuration for DLWP Healpix and GraphCast Nov 26, 2025
@pzharrington
Copy link
Collaborator

/blossom-ci

@pzharrington
Copy link
Collaborator

Hi @dran-dev , thanks for the contribution. I see this is marked as draft still, do you intend to make more changes or is it ready for review?

Removes the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError.
@dran-dev dran-dev force-pushed the feat/weather-config-enhancements branch from 7f6a7b5 to 4ed72b1 Compare November 26, 2025 23:56
@dran-dev dran-dev marked this pull request as ready for review November 27, 2025 01:09
@dran-dev
Copy link
Contributor Author

Hi @dran-dev , thanks for the contribution. I see this is marked as draft still, do you intend to make more changes or is it ready for review?

Ready for review. Tested dlwp coupled model.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 27, 2025

Greptile Overview

Greptile Summary

This PR improves configuration flexibility for DLWP Healpix and GraphCast training recipes by separating checkpoint storage from other outputs and parameterizing W&B settings.

Key Changes:

  • DLWP Healpix: Added configurable checkpoint_dir parameter across all configs, trainer, and utility scripts, allowing users to specify custom checkpoint locations while maintaining backward compatibility with default {output_dir}/tensorboard/checkpoints path
  • GraphCast: Parameterized W&B entity and project settings in config with backward-compatible defaults ("PhysicsNeMo" and "GraphCast")
  • Bug fix: Removed deprecated verbose parameter from CosineAnnealingLR configuration

Implementation Quality:
The changes are well-structured with proper fallback logic. The DLWP Healpix implementation correctly handles the checkpoint directory at three levels: main config (with default), train.py (with fallback if unspecified), and trainer.py (with fallback to legacy path). The refactoring in utils.py properly simplifies path handling by removing the hardcoded /checkpoints subdirectory appending logic.

No functional issues or bugs identified. The changes maintain backward compatibility and follow existing code patterns.

Important Files Changed

File Analysis

Filename Score Overview
examples/weather/dlwp_healpix/train.py 5/5 Updated to use configurable checkpoint_dir with fallback to default path if unspecified
examples/weather/dlwp_healpix/trainer.py 5/5 Added checkpoint_dir parameter with fallback logic, passes to checkpoint writer
examples/weather/dlwp_healpix/utils.py 5/5 Simplified checkpoint path handling - no longer appends /checkpoints subdirectory
examples/weather/graphcast/train_graphcast.py 5/5 Uses config values for W&B project and entity with backward-compatible defaults

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants