Support CANS orthogonalization in Muon.#140
Support CANS orthogonalization in Muon.#140mihara-bot wants to merge 7 commits intoNVIDIA-NeMo:mainfrom
Conversation
mihara-bot
left a comment
There was a problem hiding this comment.
I added the appropriate coefficients after human check.
Greptile SummaryThis PR adds Confidence Score: 5/5Safe to merge — core implementation is correct and previously flagged bugs (off-by-one in test, missing function definition) are resolved. The CANS coefficient table and repeat_last wiring are mathematically correct and follow the established polar_express pattern exactly. Both new tests are logically sound: the 5-step test verifies end-to-end wiring, and the 9-step test now correctly constructs a 9-entry reference list (5 unique + 4 repeats of last). All critical issues raised in prior review rounds have been addressed. Remaining open items (missing "aol" from optimizer docstrings, no SVD quality benchmark for CANS, iter_mode not exposed for "custom") are all non-blocking P2s already tracked in earlier threads. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[newton_schulz called] --> B{coefficient_type?}
B -->|polar_express| C[Load 8-entry polar_express coeffs]
B -->|cans| D[Load 5-entry CANS coeffs]
B -->|quintic / simple / aol| E[Load N-entry coeffs]
B -->|custom| F[Use custom_coefficient_sets]
C --> G[iter_mode = repeat_last]
D --> G
E --> H[iter_mode = cycle]
F --> H
G --> I[get_coefficient_iterator steps, coeffs, repeat_last]
H --> I
I --> J[islice chain coeffs + repeat last]
J --> K[Run NS steps]
K --> L[Return orthogonalized tensor]
Reviews (9): Last reviewed commit: "loosen strictness in test" | Re-trigger Greptile |
This adds `coefficient_type=\"cans\"` Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations. Made-with: Cursor Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com> Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com> Signed-off-by: mihara-bot <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com> Signed-off-by: mihara-bot <1147220090@qq.com>
Signed-off-by: mihara-bot <1147220090@qq.com>
1d9fe18 to
e225bb2
Compare
Signed-off-by: mihara-bot <1147220090@qq.com>
|
/ok to test 5f405d2 |
Signed-off-by: mihara-bot <1147220090@qq.com>
Head branch was pushed to by a user without write access
|
@mihara-bot did you check it pass at local? |
This adds
coefficient_type=\"cans\"Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.CANS (http://arxiv.org/abs/2506.10935) is an algorithm very similar to PolarExpress, which can also accelerate LLM pre-training.