pytorch-optimizer v3.8.0
Change Log
Feature
- Implement
EmoNecoandEmoZealoptimizers. (#407) - Implement
Refined Schedule-Free AdamWoptimizer. (#409, #414)- Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
- You can use this variant by setting
decoupling_cparameter in theScheduleFreeAdamWoptimizer.
- Add more built-in optimizers,
NAdam,RMSProp, andLBFGSoptimizers. (#415) - Support
cautiousvariant forMuonoptimizer. (#417) - Separate distributed functionality from
MuontoDistribtuedMuonoptimizer. (#418) - Implement
StochasticAccumulator, which is a gradient hook. (#418)
Update
- Re-implement
MuonandAdaMuonoptimizers based on the recent official implementation. (#408, #410)- Their definitions have changed from the previous version, so please check out the documentation!
- Update the missing optimizers from
__init__.py. (#415) - Add the HuggingFace Trainer example. (#415)
- Optimize the visualization outputs and change the visualization document to a table layout. (#416)
Dependency
- Update
mkdocsdependencies. (#417)
CI
Contributions
thanks to @AidinHamedi