Skip to content

Commit 3cc549f

Browse files
authored
Update README.md - Added background and usage example (#32)
* Update README with background, usage examples Signed-off-by: Santosh Bhavani <[email protected]>
1 parent 06d0bca commit 3cc549f

File tree

1 file changed

+33
-2
lines changed

1 file changed

+33
-2
lines changed

README.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,25 @@
22

33
## Overview
44

5-
Emerging Optimizers is a research project focused on understanding and optimizing the algorithmic behavior of Shampoo class optimizers (Shampoo, SOAP, Muon, etc.) and their implications to performance of GPU systems in LLM training.
5+
Emerging Optimizers is a research project focused on understanding and optimizing the algorithmic behavior of emerging optimizers (including Shampoo, SOAP, Muon, and others) and their implications to performance of GPU systems in LLM training.
66

77
> ⚠️ Note: Emerging-Optimizers is under active development. All APIs are experimental and subject to change. New features, improvements, and documentation updates are released regularly. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.
88
9+
## Background
10+
11+
### What are Emerging Optimizers?
12+
13+
Emerging optimizers represent a class of novel optimization algorithms that go beyond traditional first-order methods like Adam or SGD. These include optimizers that use matrix-based (non-diagonal) preconditioning, orthogonalization techniques, and other innovative approaches to achieve faster convergence and improved training efficiency.
14+
15+
Examples include Shampoo, which uses Kronecker-factored preconditioning ([arXiv:1802.09568](https://arxiv.org/abs/1802.09568)), and Muon, which uses Newton-Schulz orthogonalization ([arXiv:2502.16982](https://arxiv.org/abs/2502.16982)).
16+
17+
### Why They Matter
18+
19+
Emerging optimizers have demonstrated significant practical impact in large-scale language model training. Most notably, **Muon was used to train the Kimi K2 model** ([arXiv:2507.20534](https://arxiv.org/abs/2507.20534)), showcasing the effectiveness of these novel approaches at scale. These optimizers can:
20+
21+
- Achieve faster convergence, reducing the number of training steps required
22+
- Improve final model quality through better conditioning of the optimization landscape
23+
- Enable more efficient hyperparameter tuning due to reduced sensitivity to learning rates
924

1025
## Installation
1126

@@ -22,6 +37,22 @@ cd Emerging-Optimizers
2237
pip install .
2338
```
2439

25-
## User guide
40+
## Usage
41+
42+
### Muon Optimizer
43+
44+
Muon (MomentUm Orthogonalized by Newton-schulz) uses orthogonalization for 2D parameters.
45+
46+
For a simple usage example, see [`tests/test_orthogonalized_optimizer.py::MuonTest`](tests/test_orthogonalized_optimizer.py).
47+
48+
### Integration with Megatron Core
49+
50+
Integration with Megatron Core is in progress. See the [integration PR](https://github.com/NVIDIA/Megatron-LM/pull/1813) that demonstrates usage with Dense and MoE models.
51+
52+
## Benchmarks
2653

2754
Coming soon.
55+
56+
## License
57+
58+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

0 commit comments

Comments
 (0)