Skip to content

Conversation

@mkhona-nvidia
Copy link
Contributor

@mkhona-nvidia mkhona-nvidia commented Nov 13, 2025

This MR extends Muon to support adaptive learning rates for Muon-like optimizers, namely NorMuon (https://arxiv.org/pdf/2510.05491) and AdaMuon (https://arxiv.org/pdf/2507.11005).

Normuon recently was incorporated into the speedrun record (https://github.com/KellerJordan/modded-nanogpt/blob/master/train_gpt.py#L595)

@mkhona-nvidia mkhona-nvidia requested a review from skyw November 13, 2025 02:20
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@skyw skyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many will need this 2nd momentum other than OrthogonalizedOptimizer? Looks like none (or very few) because Adam based optimizer won't use it. In which case, should inherit from OrthogonalizedOptimizer and override step() function.

The choices are between copying code and keeping adding more functionalities to one class. In this case, I think copying some code in step() is the best trade off.

One another need to consider is if it turned out to be not very useful, how painful to remove the code, which would also lead to inheriting and override step function.

Copy link
Contributor

@skyw skyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also debating whether we should define a class for 2nd momentum and passing an object, that way argument list can be better managed. Can to it in further PR when opinion form.

@mkhona-nvidia mkhona-nvidia self-assigned this Nov 14, 2025
Copy link
Contributor

@skyw skyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes before we can merge, otherwise LGTM.

@mkhona-nvidia mkhona-nvidia force-pushed the adaptive_orthogonalized_optimizer branch from 9b86617 to a9f8d2c Compare November 18, 2025 01:55
@mkhona-nvidia
Copy link
Contributor Author

/ok to test 75aa10b

skyw
skyw previously approved these changes Nov 18, 2025
@mkhona-nvidia
Copy link
Contributor Author

/ok to test c720455

@mkhona-nvidia mkhona-nvidia force-pushed the adaptive_orthogonalized_optimizer branch from d4bc2ec to cc72e34 Compare November 18, 2025 22:28
@mkhona-nvidia
Copy link
Contributor Author

/ok to test cc72e34

@mkhona-nvidia
Copy link
Contributor Author

/ok to test a677a3a

@skyw skyw merged commit 7d604f1 into NVIDIA-NeMo:main Nov 18, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants