Muon: An optimizer for the hidden layers of neural networks
Tentative implementation of NorMuon from https://arxiv.org/abs/2510.05491
Currently only implemented as single device:
SingleDeviceNorMuonWithAuxAdam(param_groups)
Original Muon implementation by:
@misc{jordan2024muon,
author = {Keller Jordan and Yuchen Jin and Vlado Boza and You Jiacheng and
Franz Cesista and Laker Newhouse and Jeremy Bernstein},
title = {Muon: An optimizer for hidden layers in neural networks},
year = {2024},
url = {https://kellerjordan.github.io/posts/muon/}
}