-
Notifications
You must be signed in to change notification settings - Fork 11
Adaptive learning rate for Muon: NorMuon and AdaMuon #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive learning rate for Muon: NorMuon and AdaMuon #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many will need this 2nd momentum other than OrthogonalizedOptimizer? Looks like none (or very few) because Adam based optimizer won't use it. In which case, should inherit from OrthogonalizedOptimizer and override step() function.
The choices are between copying code and keeping adding more functionalities to one class. In this case, I think copying some code in step() is the best trade off.
One another need to consider is if it turned out to be not very useful, how painful to remove the code, which would also lead to inheriting and override step function.
emerging_optimizers/orthogonalized_optimizers/orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
skyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also debating whether we should define a class for 2nd momentum and passing an object, that way argument list can be better managed. Can to it in further PR when opinion form.
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
emerging_optimizers/orthogonalized_optimizers/adaptive_orthogonalized_optimizer.py
Outdated
Show resolved
Hide resolved
skyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes before we can merge, otherwise LGTM.
9b86617 to
a9f8d2c
Compare
|
/ok to test 75aa10b |
|
/ok to test c720455 |
Signed-off-by: mikail <[email protected]>
…d second moment computations within same code Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
This reverts commit 9d9ddf2. Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
This reverts commit 66f9196. Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
d4bc2ec to
cc72e34
Compare
|
/ok to test cc72e34 |
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
|
/ok to test a677a3a |
This MR extends Muon to support adaptive learning rates for Muon-like optimizers, namely NorMuon (https://arxiv.org/pdf/2510.05491) and AdaMuon (https://arxiv.org/pdf/2507.11005).
Normuon recently was incorporated into the speedrun record (https://github.com/KellerJordan/modded-nanogpt/blob/master/train_gpt.py#L595)