-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
lion-pytorch/lion_pytorch/lion_pytorch.py
Line 79 in 6a74fdc
| wd /= init_lr |
Decoupled decay refers to isolate the weight decay from the "gradient". The usual way to apply weight decay is to add a L2 regularization in the loss function. For SGD, it is equivalent to directly do direct weight decay on parameters, i.e.,
Here, what you implemented is the fixed decay, i.e.,
Metadata
Metadata
Assignees
Labels
No labels