Skip to content

Support CANS orthogonalization in Muon.#140

Open
mihara-bot wants to merge 7 commits intoNVIDIA-NeMo:mainfrom
mihara-bot:feat/muon-cans
Open

Support CANS orthogonalization in Muon.#140
mihara-bot wants to merge 7 commits intoNVIDIA-NeMo:mainfrom
mihara-bot:feat/muon-cans

Conversation

@mihara-bot
Copy link

This adds coefficient_type=\"cans\" Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.

CANS (http://arxiv.org/abs/2506.10935) is an algorithm very similar to PolarExpress, which can also accelerate LLM pre-training.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Author

@mihara-bot mihara-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the appropriate coefficients after human check.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR adds coefficient_type=\"cans\" support to the Newton-Schulz orthogonalization utility, enabling Muon (and its variants PolarGrad, Scion) to use CANS-based coefficients from arXiv:2506.10935. The implementation follows the same pattern already established for \"polar_express\": a 5-entry coefficient table with repeat_last iteration mode so that extra steps beyond the base 5 reuse the stable terminal coefficient.\n\n- muon_utils.py: Adds the 5 CANS coefficient tuples, extends NSCoeffT and the iter_mode selection logic. Both NSCoeffT and _COEFFICIENT_SETS now order \"cans\" consistently before \"aol\".\n- muon.py / polargrad.py / scion.py: Docstrings updated to advertise \"cans\" as a valid coefficient_type.\n- tests/test_muon_utils.py: Adds a 5-step exact-match test (test_cans_close_to_reference) and a 9-step repeat_last test (test_get_cans_9steps_close_to_reference). The 9-step reference correctly extends the base list by 4 copies of the last tuple (5 + 4 = 9), matching the implementation's repeat_last expansion — the off-by-one issue flagged in earlier review rounds is resolved here.

Confidence Score: 5/5

Safe to merge — core implementation is correct and previously flagged bugs (off-by-one in test, missing function definition) are resolved.

The CANS coefficient table and repeat_last wiring are mathematically correct and follow the established polar_express pattern exactly. Both new tests are logically sound: the 5-step test verifies end-to-end wiring, and the 9-step test now correctly constructs a 9-entry reference list (5 unique + 4 repeats of last). All critical issues raised in prior review rounds have been addressed. Remaining open items (missing "aol" from optimizer docstrings, no SVD quality benchmark for CANS, iter_mode not exposed for "custom") are all non-blocking P2s already tracked in earlier threads.

No files require special attention.

Important Files Changed

Filename Overview
emerging_optimizers/orthogonalized_optimizers/muon_utils.py Adds CANS coefficient set and correctly sets iter_mode="repeat_last" for it; NSCoeffT and _COEFFICIENT_SETS ordering are now consistent (cans before aol in both).
tests/test_muon_utils.py Two new CANS tests added: 5-step correctness check and 9-step repeat_last check; the 9-step reference correctly extends by 4 (5 unique + 4 repeats = 9 total).
emerging_optimizers/orthogonalized_optimizers/muon.py Docstring updated to list "cans" as a valid coefficient_type option.
emerging_optimizers/orthogonalized_optimizers/polargrad.py Docstring updated to list "cans" as a valid coefficient_type option.
emerging_optimizers/orthogonalized_optimizers/scion.py Docstring updated to list "cans" as a valid coefficient_type option.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[newton_schulz called] --> B{coefficient_type?}
    B -->|polar_express| C[Load 8-entry polar_express coeffs]
    B -->|cans| D[Load 5-entry CANS coeffs]
    B -->|quintic / simple / aol| E[Load N-entry coeffs]
    B -->|custom| F[Use custom_coefficient_sets]
    C --> G[iter_mode = repeat_last]
    D --> G
    E --> H[iter_mode = cycle]
    F --> H
    G --> I[get_coefficient_iterator steps, coeffs, repeat_last]
    H --> I
    I --> J[islice chain coeffs + repeat last]
    J --> K[Run NS steps]
    K --> L[Return orthogonalized tensor]
Loading

Reviews (9): Last reviewed commit: "loosen strictness in test" | Re-trigger Greptile

Copy link
Contributor

@skyw skyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes needed. otherwise LGTM.

Instruction of DCO is in CONTRIBUTING.md

mihara-bot and others added 5 commits March 25, 2026 14:23
This adds `coefficient_type=\"cans\"` Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.

Made-with: Cursor
Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
@skyw
Copy link
Contributor

skyw commented Mar 26, 2026

/ok to test 5f405d2

Signed-off-by: mihara-bot <1147220090@qq.com>
auto-merge was automatically disabled March 26, 2026 09:03

Head branch was pushed to by a user without write access

@skyw
Copy link
Contributor

skyw commented Mar 26, 2026

@mihara-bot did you check it pass at local?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants