Skip to content

Releases: kozistr/pytorch_optimizer

pytorch-optimizer v3.4.0

02 Feb 05:07
8da7b49

Choose a tag to compare

Change Log

Feature

Update

  • Support OrthoGrad variant to Ranger25. (#332)
    • Ranger25 optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such as ADOPT + AdEMAMix + Cautious + StableAdamW + Adam-Atan2 + OrthoGrad.

Fix

  • Add the missing state property in OrthoGrad optimizer. (#326, #327)
  • Add the missing state_dict, and load_state_dict methods to TRAC and OrthoGrad optimizers. (#332)
  • Skip when the gradient is sparse in OrthoGrad optimizer. (#332)
  • Support alternative precision training in SOAP optimizer. (#333)
  • Store SOAP condition matrices as the dtype of their parameters. (#335)

Contributions

thanks to @Vectorrent, @kylevedder

pytorch-optimizer v3.3.4

19 Jan 06:31
55c3553

Choose a tag to compare

Change Log

Feature

  • Support OrthoGrad feature for create_optimizer(). (#324)
  • Enhanced flexibility for the optimizer parameter in Lookahead, TRAC, and OrthoGrad optimizers. (#324)
    • Now supports both torch.optim.Optimizer instances and classes
    • You can now use Lookahead optimizer in two ways.
      • Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)
      • Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
  • Implement SPAM optimizer. (#324)
  • Implement TAM, and AdaTAM optimizers. (#325)

pytorch-optimizer v3.3.3

13 Jan 16:07
5baa713

Choose a tag to compare

Change Log

Feature

pytorch-optimizer v3.3.2

21 Dec 10:38
8f538d4

Choose a tag to compare

Change Log

Feature

Bug

  • Clone exp_avg before calling apply_cautious not to mask exp_avg. (#316)

pytorch-optimizer v3.3.1

21 Dec 07:20
d16a368

Choose a tag to compare

Change Log

Feature

Bug

  • Fix bias_correction in AdamG optimizer. (#305, #308)
  • Fix a potential bug when loading the state for Lookahead optimizer. (#306, #310)

Docs

Contributions

thanks to @Vectorrent

pytorch-optimizer v3.3.0

06 Dec 14:44
5def5d7

Choose a tag to compare

Change Log

Feature

Refactor

  • Big refactoring, removing direct import from pytorch_optimizer.*.
    • I removed some methods not to directly import from it from pytorch_optimzier.* because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers.
    • pytorch_optimizer.[Shampoo stuff] -> pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff].
      • shampoo_utils like Graft, BlockPartitioner, PreConditioner, etc. You can check the details here.
    • pytorch_optimizer.GaLoreProjector -> pytorch_optimizer.optimizers.galore.GaLoreProjector.
    • pytorch_optimizer.gradfilter_ema -> pytorch_optimizer.optimizers.grokfast.gradfilter_ema.
    • pytorch_optimizer.gradfilter_ma -> pytorch_optimizer.optimizers.grokfast.gradfilter_ma.
    • pytorch_optimizer.l2_projection -> pytorch_optimizer.optimizers.alig.l2_projection.
    • pytorch_optimizer.flatten_grad -> pytorch_optimizer.optimizers.pcgrad.flatten_grad.
    • pytorch_optimizer.un_flatten_grad -> pytorch_optimizer.optimizers.pcgrad.un_flatten_grad.
    • pytorch_optimizer.reduce_max_except_dim -> pytorch_optimizer.optimizers.sm3.reduce_max_except_dim.
    • pytorch_optimizer.neuron_norm -> pytorch_optimizer.optimizers.nero.neuron_norm.
    • pytorch_optimizer.neuron_mean -> pytorch_optimizer.optimizers.nero.neuron_mean.

Docs

  • Add more visualizations. (#297)

Bug

  • Add optimizer parameter to PolyScheduler constructor. (#295)

Contributions

thanks to @tanganke

pytorch-optimizer v3.2.0

28 Oct 23:30
a59f2e1

Choose a tag to compare

Change Log

Feature

  • Implement SOAP optimizer. (#275)
  • Support AdEMAMix variants. (#276)
    • bnb_ademamix8bit, bnb_ademamix32bit, bnb_paged_ademamix8bit, bnb_paged_ademamix32bit
  • Support 8/4bit, fp8 optimizers. (#208, #281)
    • torchao_adamw8bit, torchao_adamw4bit, torchao_adamwfp8.
  • Support a module-name-level (e.g. LayerNorm) weight decay exclusion for get_optimizer_parameters. (#282, #283)
  • Implement CPUOffloadOptimizer, which offloads optimizer to CPU for single-GPU training. (#284)
  • Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.

Bug

  • Fix should_grokfast condition when initialization. (#279, #280)

Contributions

thanks to @Vectorrent

pytorch-optimizer v3.1.2

10 Sep 10:58
9d5e181

Choose a tag to compare

Change Log

Feature

Bug

  • Add **kwargs to the parameters for dummy placeholder. (#270, #271)

pytorch-optimizer v3.1.1

14 Aug 09:47
a8eb19c

Choose a tag to compare

Change Log

Feature

Bug

  • Handle the optimizers that only take the model instead of the parameters in create_optimizer(). (#263)
  • Move the variable to the same device with the parameter. (#266, #267)

pytorch-optimizer v3.1.0

21 Jul 11:54
d00136f

Choose a tag to compare

Change Log

Feature

Refactor

  • Refactor AdamMini optimizer. (#258)
  • Deprecate optional dependency, bitsandbytes. (#258)
  • Move get_rms, approximate_sq_grad functions to BaseOptimizer for reusability. (#258)
  • Refactor shampoo_utils.py. (#259)
  • Add debias, debias_adam methods in BaseOptimizer. (#261)
  • Refactor to use BaseOptimizer only, not inherit multiple classes. (#261)

Bug

  • Fix several bugs in AdamMini optimizer. (#257)

Contributions

thanks to @sdbds