@@ -87,14 +87,13 @@ If you want to build the optimizer with parameters & configs, there's `create_op
8787Supported Optimizers
8888--------------------
8989
90- You can check the supported optimizers & lr schedulers .
90+ You can check the supported optimizers with below code .
9191
9292::
9393
94- from pytorch_optimizer import get_supported_optimizers, get_supported_lr_schedulers
94+ from pytorch_optimizer import get_supported_optimizers
9595
9696 supported_optimizers = get_supported_optimizers()
97- supported_lr_schedulers = get_supported_lr_schedulers()
9897
9998+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
10099| Optimizer | Description | Official Code | Paper | Citation |
@@ -201,14 +200,10 @@ You can check the supported optimizers & lr schedulers.
201200+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
202201| Softplus T | *Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM * | | `https://arxiv.org/abs/1908.00700 <https://arxiv.org/abs/1908.00700 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2019arXiv190800700T/exportcitation >`__ |
203202+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
204- | EE LRS | * Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule * | | `https://arxiv.org/abs/2003.03977 <https://arxiv.org/abs/2003.03977 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I /exportcitation >`__ |
203+ | Un-tuned w/u | * On the adequacy of untuned warmup for adaptive optimization * | | `https://arxiv.org/abs/1910.04209 <https://arxiv.org/abs/1910.04209 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2019arXiv191004209M /exportcitation >`__ |
205204+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
206205| Norm Loss | *An efficient yet effective regularization method for deep neural networks * | | `https://arxiv.org/abs/2103.06583 <https://arxiv.org/abs/2103.06583 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2021arXiv210306583G/exportcitation >`__ |
207206+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
208- | Chebyshev LR | *Acceleration via Fractal Learning Rate Schedules * | | `https://arxiv.org/abs/2103.01338 <https://arxiv.org/abs/2103.01338 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation >`__ |
209- +--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
210- | Un-tuned WU | *On the adequacy of untuned warmup for adaptive optimization * | | `https://arxiv.org/abs/1910.04209 <https://arxiv.org/abs/1910.04209 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2019arXiv191004209M/exportcitation >`__ |
211- +--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
212207| AdaShift | *Decorrelation and Convergence of Adaptive Learning Rate Methods * | `github <https://github.com/MichaelKonobeev/adashift >`__ | `https://arxiv.org/abs/1810.00143v4 <https://arxiv.org/abs/1810.00143v4 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2018arXiv181000143Z/exportcitation >`__ |
213208+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
214209| AdaDelta | *An Adaptive Learning Rate Method * | | `https://arxiv.org/abs/1212.5701v1 <https://arxiv.org/abs/1212.5701v1 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2012arXiv1212.5701Z/exportcitation >`__ |
@@ -222,6 +217,25 @@ You can check the supported optimizers & lr schedulers.
222217| Sophia | *A Scalable Stochastic Second-order Optimizer for Language Model Pre-training * | `github <https://github.com/Liuhong99/Sophia >`__ | `https://arxiv.org/abs/2305.14342 <https://arxiv.org/abs/2305.14342 >`__ | `cite <https://github.com/Liuhong99/Sophia >`__ |
223218+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
224219
220+ Supported LR Scheduler
221+ ----------------------
222+
223+ You can check the supported learning rate schedulers with below code.
224+
225+ ::
226+
227+ from pytorch_optimizer import get_supported_lr_schedulers
228+
229+ supported_lr_schedulers = get_supported_lr_schedulers()
230+
231+ +------------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
232+ | LR Scheduler | Description | Official Code | Paper | Citation |
233+ +==================+===================================================================================================+===================================================================================+===============================================================================================+======================================================================================================================+
234+ | Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule * | | `https://arxiv.org/abs/2003.03977 <https://arxiv.org/abs/2003.03977 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation >`__ |
235+ +------------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
236+ | Chebyshev | *Acceleration via Fractal Learning Rate Schedules * | | `https://arxiv.org/abs/2103.01338 <https://arxiv.org/abs/2103.01338 >`__ | `cite <https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation >`__ |
237+ +------------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+
238+
225239Useful Resources
226240----------------
227241
0 commit comments