Skip to content

Commit 5889b64

Browse files
update credits for orthogonalized optimizer and scaling in muon (#18)
Signed-off-by: mikail <[email protected]>
1 parent 2fdbc22 commit 5889b64

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

emerging_optimizers/orthogonalized_optimizers/muon.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,10 +119,10 @@ def get_muon_scale_factor(
119119
# Suggested by Muon (https://kellerjordan.github.io/posts/muon/)
120120
return extra_scale_factor * max(1, size_out / size_in) ** 0.5
121121
elif mode == "spectral":
122-
# Suggested by Scion (https://arxiv.org/abs/2502.07529) and Kimi (https://arxiv.org/abs/2502.16982)
122+
# Suggested by K. Jordan and Kimi (https://arxiv.org/abs/2502.16982)
123123
return extra_scale_factor * max(size_out, size_in) ** 0.5
124124
elif mode == "unit_rms_norm":
125-
# Suggested by Bernstein et al. (https://jeremybernste.in/writing/deriving-muon)
125+
# Suggested by Scion (https://arxiv.org/abs/2502.07529) and Bernstein et al. (https://jeremybernste.in/writing/deriving-muon)
126126
return extra_scale_factor * (size_out / size_in) ** 0.5
127127
else:
128128
raise ValueError(f"Invalid mode for Muon update scale factor: {mode}")

emerging_optimizers/orthogonalized_optimizers/orthogonalized_optimizer.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ class OrthogonalizedOptimizer(optim.Optimizer):
4545
4646
- Carlson, D., Cevher, V., and Carin, L. *Stochastic spectral descent for Restricted Boltzmann Machines.*
4747
In International Conference on Artificial Intelligence and Statistics (2015a).
48+
- Carlson, D., Hsieh, Y.-P., Collins, E., Carin, L., and Cevher, V. *Stochastic Spectral Descent for Discrete Graphical Models.*
49+
In IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 2, pp. 296-311 (2016).
4850
- Carlson, D., Collins, E., Hsieh, Y.-P., Carin, L., and Cevher, V. *Preconditioned spectral descent for deep learning.*
4951
In Neural Information Processing Systems (2015b).
5052
- Flynn, T. *The duality structure gradient descent algorithm: analysis and applications to neural networks.*

0 commit comments

Comments
 (0)