File tree Expand file tree Collapse file tree 1 file changed +3
-2
lines changed
emerging_optimizers/orthogonalized_optimizers Expand file tree Collapse file tree 1 file changed +3
-2
lines changed Original file line number Diff line number Diff line change @@ -53,15 +53,16 @@ class Muon(OrthogonalizedOptimizer):
5353 Warning:
5454 - This optimizer requires that all parameters passed in are 2D.
5555 - It should not be used for the embedding layer, the final fully connected layer, or any 1-D
56- parameters; those should all be optimized by a standard method (e.g., AdamW).
56+ parameters; those can all be optimized by a standard method (e.g., AdamW).
5757
5858 Args:
5959 {_args_doc}
6060 coefficient_type: The type of coefficient set to use for the Newton-Schulz iteration. Can be one of
6161 ["simple", "quintic", "polar_express"].
6262 num_ns_steps: The number of iteration steps to use in the Newton-Schulz iteration.
6363 scale_mode: The type of scale factor to use for the update. Defaults to "spectral" style scaling.
64- extra_scale_factor: The additional scale factor to use for the update.
64+ extra_scale_factor: The additional scale factor to use for the update. Set it to 0.2 can closely match
65+ the update RMS norm of AdamW as suggested by https://arxiv.org/abs/2502.16982.
6566 use_syrk: Whether to use the Triton kernel for the Newton-Schulz iteration.
6667 """
6768
You can’t perform that action at this time.
0 commit comments