NVIDIA-NeMo · skyw · Nov 11, 2025 · Nov 8, 2025 · Nov 8, 2025 · Nov 8, 2025
diff --git a/README.md b/README.md
@@ -11,6 +11,7 @@
 [![CICD NeMo](https://github.com/NVIDIA-NeMo/Emerging-Optimizers/actions/workflows/cicd-main.yml/badge.svg?branch=main)](https://github.com/NVIDIA-NeMo/Emerging-Optimizers/actions/workflows/cicd-main.yml)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-3100/)
 ![GitHub Repo stars](https://img.shields.io/github/stars/NVIDIA-NeMo/Emerging-Optimizers)
+[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html)
 
 </div>
 
@@ -53,15 +54,13 @@ pip install .
 
 ## Usage
 
-### Muon Optimizer
+### Example
 
-Muon (MomentUm Orthogonalized by Newton-schulz) uses orthogonalization for 2D parameters.
-
-For a simple usage example, see [`tests/test_orthogonalized_optimizer.py::MuonTest`](tests/test_orthogonalized_optimizer.py).
+Refer to tests for usage of different optimizers, e.g.  [`tests/test_orthogonalized_optimizer.py::MuonTest`](tests/test_orthogonalized_optimizer.py).
 
 ### Integration with Megatron Core
 
-Integration with Megatron Core is in progress. See the [integration PR](https://github.com/NVIDIA/Megatron-LM/pull/1813) that demonstrates usage with Dense and MoE models.
+Integration with Megatron Core is available in **dev** branch, e.g. [muon.py](https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/muon.py)
 
 ## Benchmarks
 

diff --git a/docs/apidocs/index.md b/docs/apidocs/index.md
@@ -6,10 +6,11 @@ NeMo Emerging Optimizers API reference provides comprehensive technical document
 :caption: API Documentation
 :hidden:
 
-utils.md
 orthogonalized-optimizers.md
 soap.md
 riemannian-optimizers.md
 psgd.md
 scalar-optimizers.md
+mixin.md
+utils.md
 ```
diff --git a/docs/apidocs/mixin.md b/docs/apidocs/mixin.md
@@ -0,0 +1,12 @@
+
+```{eval-rst}  
+.. role:: hidden
+    :class: hidden-section
+
+emerging_optimizers.mixin
+==========================
+
+.. automodule:: emerging_optimizers.mixin
+    :members:
+    :private-members:
+```
diff --git a/docs/apidocs/orthogonalized-optimizers.md b/docs/apidocs/orthogonalized-optimizers.md
@@ -21,6 +21,13 @@ emerging_optimizers.orthogonalized_optimizers
     :members:
 
 
+:hidden:`Scion`
+~~~~~~~~~~~~~~~
+
+.. autoclass:: Scion
+    :members:
+
+
 :hidden:`Newton-Schulz`
 ~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: emerging_optimizers.orthogonalized_optimizers.muon_utils

diff --git a/docs/apidocs/soap.md b/docs/apidocs/soap.md
@@ -20,6 +20,8 @@ emerging_optimizers.soap
 
 .. autofunction:: update_kronecker_factors
 
+.. autofunction:: update_kronecker_factors_kl_shampoo
+
 .. autofunction:: update_eigenbasis_and_momentum
 
 emerging_optimizers.soap.soap_utils

diff --git a/docs/index.md b/docs/index.md
@@ -12,7 +12,7 @@ Emerging-Optimizers is under active development. All APIs are experimental and s
 
 ### Prerequisites
 
-- Python 3.12 or higher
+- Python 3.10 or higher, 3.12 is recommended
 - PyTorch 2.0 or higher
 
 ### Install from Source
@@ -33,8 +33,8 @@ Coming soon.
 :caption: 🛠️ Development
 :hidden:
 
-documentation.md
 apidocs/index.md
+documentation.md
 ```
 
 
diff --git a/emerging_optimizers/mixin.py b/emerging_optimizers/mixin.py
@@ -25,6 +25,7 @@ class WeightDecayMixin:
     """Mixin for weight decay
 
     Supports different types of weight decay:
+
     - "decoupled": weight decay is applied directly to params without changing gradients
     - "independent": similar as decoupled weight decay, but without tying weight decay and learning rate
     - "l2": classic L2 regularization

diff --git a/emerging_optimizers/orthogonalized_optimizers/__init__.py b/emerging_optimizers/orthogonalized_optimizers/__init__.py
@@ -14,4 +14,5 @@
 # limitations under the License.
 from emerging_optimizers.orthogonalized_optimizers.muon import *
 from emerging_optimizers.orthogonalized_optimizers.orthogonalized_optimizer import *
+from emerging_optimizers.orthogonalized_optimizers.scion import *
 from emerging_optimizers.orthogonalized_optimizers.spectral_clipping_utils import *