-
Notifications
You must be signed in to change notification settings - Fork 498
Adds CombinedOptimizer
#1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2.0-refactor
Are you sure you want to change the base?
Adds CombinedOptimizer
#1241
Conversation
This commit introduces the CombinedOptimizer class, which allows users to combine multiple PyTorch optimizers into a single interface. The new class supports dynamic parameter addition, state management, and optional compilation of step functions for performance improvements. It also includes detailed documentation and examples for ease of use.
This commit introduces an `__init__.py` file for the optimizer module, providing a clear entry point for the optimizer utilities in PhysicsNeMo. Additionally, it enhances the `CombinedOptimizer` class by implementing a flag for initialization, allowing the addition of parameter groups during initialization. The `step` method is updated to return the loss value from the last optimizer, improving its usability. Documentation and comments have been added for clarity.
This commit introduces a comprehensive test suite for the CombinedOptimizer class, covering initialization, parameter aggregation, step execution, and state management. The tests ensure that the CombinedOptimizer correctly integrates multiple optimizers, handles closures, and maintains expected behavior during training. Additionally, it verifies the proper functioning of learning rate schedulers with the combined optimizer. This enhances the reliability and robustness of the optimizer's implementation.
Greptile OverviewGreptile SummaryAdds Critical Issue Found:
Recommendations:
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
… that the closure is evaluated multiple times, matching individual optimizer behavior.
coreyjadams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. I appreciate extensive testing.
This adds a new package, physicsnemo.optim, which I think is overdue. As part of the refactor let's make sure to get it in the docs too :).
I left some comments but do have another question. What happens when users instantiate and use this and the parameter groups are not disjoint? Will it cause an error? Silent bugs? Should we include a check that each parameter group is completely disjoint?
| torch.compile(opt.step, **torch_compile_kwargs) for opt in optimizers | ||
| ] | ||
|
|
||
| def zero_grad(self, *args, **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream interface accepts only set_to_none=True here, and not other parameters. I know if we passed it, that would succeed, but unless there are optimizers accepting other vales for zero_grad I think we should stick to set_to_none.
| def __init__( | ||
| self, | ||
| optimizers: Sequence[Optimizer], | ||
| torch_compile_kwargs: dict[str, Any] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is not upstream, right? I assume we can't put step into a compile wrapper since that will break the closure behavior?
| def step( | ||
| self, closure: Callable[[], torch.Tensor] | None = None | ||
| ) -> torch.Tensor | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream returns float | None, not tensor, FYI, in both the closure and step:
| res = step_fn(closure) | ||
| if res is not None: | ||
| loss = res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this logic - if there are multiple optimizers and a closure, are they expected to return the same value? Right now we overwrite with the value of the last non-None res.
| return CombinedOptimizer(optimizers) | ||
|
|
||
|
|
||
| class TestInitialization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our other tests are more or less "functionals" and not class-based tests. I don't personally have any objection to this style, but I don't know what unforeseen consequences we might see with this. Any thoughts?
| def test_state_dict_structure(self, combined_optimizer): | ||
| state = combined_optimizer.state_dict() | ||
| assert "optimizers" in state | ||
| assert len(state["optimizers"]) == 2 | ||
| assert isinstance(state["optimizers"][0], dict) | ||
|
|
||
| def test_load_state_dict(self, combined_optimizer, model): | ||
| # Save state | ||
| state = combined_optimizer.state_dict() | ||
|
|
||
| # Create new optimizer | ||
| opt1 = SGD(model.layer1.parameters(), lr=0.01) | ||
| opt2 = Adam(model.layer2.parameters(), lr=0.001) | ||
| new_combined = CombinedOptimizer([opt1, opt2]) | ||
|
|
||
| # Load state | ||
| new_combined.load_state_dict(state) | ||
|
|
||
| # Verify equality (basic check) | ||
| # Note: strict equality of state dicts might fail due to weak refs or unrelated keys, | ||
| # so we check structure matches. | ||
| new_state = new_combined.state_dict() | ||
| assert len(new_state["optimizers"]) == len(state["optimizers"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this test to check save / restore is getting numerics correct too? Loading the wrong optimizer state or getting the numbers incorrect would be a pretty frustrating bug in training codes.
PhysicsNeMo Pull Request
Adds a new
CombinedOptimizerutility, which is useful for the increasingly-popular "architecture-aware optimizers", such as Muon.This PR targets the v2.0 refactor branch, so this should only be merged after #1235 .
Description
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.