-
Notifications
You must be signed in to change notification settings - Fork 499
Enhance checkpoint configuration for DLWP Healpix and GraphCast #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…xamples - Added configurable checkpoint directory to DLWP Healpix config and training script. - Implemented Trainer logic to use specific checkpoint directory. - Updated utils.py to respect exact checkpoint path. - Made Weights & Biases entity and project configurable in GraphCast example.
dcadcc6 to
fb82a44
Compare
|
/blossom-ci |
|
Hi @dran-dev , thanks for the contribution. I see this is marked as draft still, do you intend to make more changes or is it ready for review? |
Removes the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError.
7f6a7b5 to
4ed72b1
Compare
Ready for review. Tested dlwp coupled model. |
Greptile OverviewGreptile SummaryThis PR improves configuration flexibility for DLWP Healpix and GraphCast training recipes by separating checkpoint storage from other outputs and parameterizing W&B settings. Key Changes:
Implementation Quality: No functional issues or bugs identified. The changes maintain backward compatibility and follow existing code patterns. Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 files reviewed, no comments
PhysicsNeMo Pull Request
Description
This PR improves the configuration flexibility for DLWP Healpix and GraphCast training recipes. It allows for separation of checkpoint storage from other output artifacts and standardizes configuration management.
Changes
checkpoint_diracross configs, trainer, and utility scripts. fallback to default behavior ({output_dir}/tensorboard/checkpoints) if unspecified.entityandprojectsettings in Hydra config with backward-compatible defaults.CHANGELOG.md.Testing
{output_dir}/tensorboard/checkpointsifcheckpoint_diris missing.checkpoint_dirlogic correctly resolves paths intrain.pyandtrainer.py.Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.