Support CANS orthogonalization in Muon. by mihara-bot · Pull Request #140 · NVIDIA-NeMo/Emerging-Optimizers

mihara-bot · 2026-03-20T10:28:08Z

This adds coefficient_type=\"cans\" Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.

CANS (http://arxiv.org/abs/2506.10935) is an algorithm very similar to PolarExpress, which can also accelerate LLM pre-training.

copy-pr-bot · 2026-03-20T10:28:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

mihara-bot

I added the appropriate coefficients after human check.

greptile-apps · 2026-03-20T10:30:33Z

Greptile Summary

This PR adds coefficient_type=\"cans\" support to the Newton-Schulz orthogonalization utility, enabling Muon (and its variants PolarGrad, Scion) to use CANS-based coefficients from arXiv:2506.10935. The implementation follows the same pattern already established for \"polar_express\": a 5-entry coefficient table with repeat_last iteration mode so that extra steps beyond the base 5 reuse the stable terminal coefficient.\n\n- muon_utils.py: Adds the 5 CANS coefficient tuples, extends NSCoeffT and the iter_mode selection logic. Both NSCoeffT and _COEFFICIENT_SETS now order \"cans\" consistently before \"aol\".\n- muon.py / polargrad.py / scion.py: Docstrings updated to advertise \"cans\" as a valid coefficient_type.\n- tests/test_muon_utils.py: Adds a 5-step exact-match test (test_cans_close_to_reference) and a 9-step repeat_last test (test_get_cans_9steps_close_to_reference). The 9-step reference correctly extends the base list by 4 copies of the last tuple (5 + 4 = 9), matching the implementation's repeat_last expansion — the off-by-one issue flagged in earlier review rounds is resolved here.

Confidence Score: 5/5

Safe to merge — core implementation is correct and previously flagged bugs (off-by-one in test, missing function definition) are resolved.

The CANS coefficient table and repeat_last wiring are mathematically correct and follow the established polar_express pattern exactly. Both new tests are logically sound: the 5-step test verifies end-to-end wiring, and the 9-step test now correctly constructs a 9-entry reference list (5 unique + 4 repeats of last). All critical issues raised in prior review rounds have been addressed. Remaining open items (missing "aol" from optimizer docstrings, no SVD quality benchmark for CANS, iter_mode not exposed for "custom") are all non-blocking P2s already tracked in earlier threads.

No files require special attention.

Important Files Changed

Filename	Overview
emerging_optimizers/orthogonalized_optimizers/muon_utils.py	Adds CANS coefficient set and correctly sets iter_mode="repeat_last" for it; NSCoeffT and _COEFFICIENT_SETS ordering are now consistent (cans before aol in both).
tests/test_muon_utils.py	Two new CANS tests added: 5-step correctness check and 9-step repeat_last check; the 9-step reference correctly extends by 4 (5 unique + 4 repeats = 9 total).
emerging_optimizers/orthogonalized_optimizers/muon.py	Docstring updated to list "cans" as a valid coefficient_type option.
emerging_optimizers/orthogonalized_optimizers/polargrad.py	Docstring updated to list "cans" as a valid coefficient_type option.
emerging_optimizers/orthogonalized_optimizers/scion.py	Docstring updated to list "cans" as a valid coefficient_type option.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[newton_schulz called] --> B{coefficient_type?}
    B -->|polar_express| C[Load 8-entry polar_express coeffs]
    B -->|cans| D[Load 5-entry CANS coeffs]
    B -->|quintic / simple / aol| E[Load N-entry coeffs]
    B -->|custom| F[Use custom_coefficient_sets]
    C --> G[iter_mode = repeat_last]
    D --> G
    E --> H[iter_mode = cycle]
    F --> H
    G --> I[get_coefficient_iterator steps, coeffs, repeat_last]
    H --> I
    I --> J[islice chain coeffs + repeat last]
    J --> K[Run NS steps]
    K --> L[Return orthogonalized tensor]

_{Reviews (9): Last reviewed commit: "loosen strictness in test" | Re-trigger Greptile}

tests/test_muon_utils.py

emerging_optimizers/orthogonalized_optimizers/muon_utils.py

tests/test_muon_utils.py

emerging_optimizers/orthogonalized_optimizers/muon_utils.py

emerging_optimizers/orthogonalized_optimizers/muon.py

tests/test_muon_utils.py

skyw

Some minor changes needed. otherwise LGTM.

Instruction of DCO is in CONTRIBUTING.md

emerging_optimizers/orthogonalized_optimizers/muon_utils.py

tests/test_muon_utils.py

emerging_optimizers/orthogonalized_optimizers/muon_utils.py

This adds `coefficient_type=\"cans\"` Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations. Made-with: Cursor Signed-off-by: mihara-bot <1147220090@qq.com>

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com> Signed-off-by: mihara-bot <1147220090@qq.com>

Signed-off-by: mihara-bot <1147220090@qq.com>

skyw · 2026-03-26T03:27:47Z

/ok to test 5f405d2

Signed-off-by: mihara-bot <1147220090@qq.com>

skyw · 2026-03-26T15:46:02Z

@mihara-bot did you check it pass at local?

mihara-bot commented Mar 20, 2026

View reviewed changes