Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v3.4.0
Change Log
Feature
- Implement
FOCUSoptimizer. (#330, #331) - Implement
PSGD Kronoptimizer. (#336, #337) - Implement
EXAdamoptimizer. (#338, #339)
Update
- Support
OrthoGradvariant toRanger25. (#332)Ranger25optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such asADOPT+AdEMAMix+Cautious+StableAdamW+Adam-Atan2+OrthoGrad.
Fix
- Add the missing
stateproperty inOrthoGradoptimizer. (#326, #327) - Add the missing
state_dict, andload_state_dictmethods toTRACandOrthoGradoptimizers. (#332) - Skip when the gradient is sparse in
OrthoGradoptimizer. (#332) - Support alternative precision training in
SOAPoptimizer. (#333) - Store SOAP condition matrices as the dtype of their parameters. (#335)
Contributions
thanks to @Vectorrent, @kylevedder
pytorch-optimizer v3.3.4
Change Log
Feature
- Support
OrthoGradfeature forcreate_optimizer(). (#324) - Enhanced flexibility for the
optimizerparameter inLookahead,TRAC, andOrthoGradoptimizers. (#324)- Now supports both torch.optim.Optimizer instances and classes
- You can now use
Lookaheadoptimizer in two ways.Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
- Implement
SPAMoptimizer. (#324) - Implement
TAM, andAdaTAMoptimizers. (#325)
pytorch-optimizer v3.3.3
Change Log
Feature
- Implement
Gramsoptimizer. (#317, #318) - Support
stable_adamwvariant forADOPTandAdEMAMixoptimizer. (#321)optimizer = ADOPT(model.parameters(), ..., stable_adamw=True)
- Implement an experimental optimizer
Ranger25(not tested). (#321)- mixing
ADOPT + AdEMAMix + StableAdamW + Cautious + RAdamoptimizers.
- mixing
- Implement
OrthoGradoptimizer. (#321) - Support
Adam-Atan2feature forProdigyoptimizer whenepsis None. (#321)
pytorch-optimizer v3.3.2
pytorch-optimizer v3.3.1
Change Log
Feature
- Support
Cautiousvariant toAdaShiftoptimizer. (#310) - Save the state of the
Lookaheadoptimizer too. (#310) - Implement
APOLLOoptimizer. (#311, #312) - Rename the
Apollo(An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization) optimizer name toApolloDQNnot to overlap with the new optimizer nameAPOLLO. (#312) - Implement
MARSoptimizer. (#313, #314) - Support
Cautiousvariant toMARSoptimizer. (#314)
Bug
- Fix
bias_correctioninAdamGoptimizer. (#305, #308) - Fix a potential bug when loading the state for
Lookaheadoptimizer. (#306, #310)
Docs
Contributions
thanks to @Vectorrent
pytorch-optimizer v3.3.0
Change Log
Feature
- Support
PaLMvariant forScheduleFreeAdamWoptimizer. (#286, #288)- you can use this feature by setting
use_palmtoTrue.
- you can use this feature by setting
- Implement
ADOPToptimizer. (#289, #290) - Implement
FTRLoptimizer. (#291) - Implement
Cautious optimizerfeature. (#294)- Improving Training with One Line of Code
- you can use it by setting
cautious=TrueforLion,AdaFactorandAdEMAMixoptimizers.
- Improve the stability of
ADOPToptimizer. (#294) - Support a new projection type
randomforGaLoreProjector. (#294) - Implement
DeMooptimizer. (#300, #301) - Implement
Muonoptimizer. (#302) - Implement
ScheduleFreeRAdamoptimizer. (#304) - Implement
LaPropoptimizer. (#304) - Support
Cautiousvariant toLaProp,AdamP,Adoptoptimizers. (#304).
Refactor
- Big refactoring, removing direct import from
pytorch_optimizer.*.- I removed some methods not to directly import from it from
pytorch_optimzier.*because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers. pytorch_optimizer.[Shampoo stuff]->pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff].shampoo_utilslikeGraft,BlockPartitioner,PreConditioner, etc. You can check the details here.
pytorch_optimizer.GaLoreProjector->pytorch_optimizer.optimizers.galore.GaLoreProjector.pytorch_optimizer.gradfilter_ema->pytorch_optimizer.optimizers.grokfast.gradfilter_ema.pytorch_optimizer.gradfilter_ma->pytorch_optimizer.optimizers.grokfast.gradfilter_ma.pytorch_optimizer.l2_projection->pytorch_optimizer.optimizers.alig.l2_projection.pytorch_optimizer.flatten_grad->pytorch_optimizer.optimizers.pcgrad.flatten_grad.pytorch_optimizer.un_flatten_grad->pytorch_optimizer.optimizers.pcgrad.un_flatten_grad.pytorch_optimizer.reduce_max_except_dim->pytorch_optimizer.optimizers.sm3.reduce_max_except_dim.pytorch_optimizer.neuron_norm->pytorch_optimizer.optimizers.nero.neuron_norm.pytorch_optimizer.neuron_mean->pytorch_optimizer.optimizers.nero.neuron_mean.
- I removed some methods not to directly import from it from
Docs
- Add more visualizations. (#297)
Bug
- Add optimizer parameter to
PolySchedulerconstructor. (#295)
Contributions
thanks to @tanganke
pytorch-optimizer v3.2.0
Change Log
Feature
- Implement
SOAPoptimizer. (#275) - Support
AdEMAMixvariants. (#276)bnb_ademamix8bit,bnb_ademamix32bit,bnb_paged_ademamix8bit,bnb_paged_ademamix32bit
- Support 8/4bit, fp8 optimizers. (#208, #281)
torchao_adamw8bit,torchao_adamw4bit,torchao_adamwfp8.
- Support a module-name-level (e.g.
LayerNorm) weight decay exclusion forget_optimizer_parameters. (#282, #283) - Implement
CPUOffloadOptimizer, which offloads optimizer to CPU for single-GPU training. (#284) - Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.
Bug
Contributions
thanks to @Vectorrent
pytorch-optimizer v3.1.2
pytorch-optimizer v3.1.1
pytorch-optimizer v3.1.0
Change Log
Feature
- Implement
AdaLomooptimizer. (#258) - Support
Q-GaLoreoptimizer. (#258)- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
- you can use by
optimizer = load_optimizer('q_galore_adamw8bit')
- Support more bnb optimizers. (#258)
bnb_paged_adam8bit,bnb_paged_adamw8bit,bnb_*_*32bit.
- Improve
power_iteration()speed up to 40%. (#259) - Improve
reg_noise()(E-MCMC) speed up to 120%. (#260) - Support
disable_lr_schedulerparameter forRanger21optimizer to disable built-in learning rate scheduler. (#261)
Refactor
- Refactor
AdamMinioptimizer. (#258) - Deprecate optional dependency,
bitsandbytes. (#258) - Move
get_rms,approximate_sq_gradfunctions toBaseOptimizerfor reusability. (#258) - Refactor
shampoo_utils.py. (#259) - Add
debias,debias_adammethods inBaseOptimizer. (#261) - Refactor to use
BaseOptimizeronly, not inherit multiple classes. (#261)
Bug
- Fix several bugs in
AdamMinioptimizer. (#257)
Contributions
thanks to @sdbds