-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
See src/nn_lib/optim/lr_finder.py. The concept is good; we improve on the lightning LR Finder by using isotonic regression to smooth out the losses. But...
- empirically, it sometimes seems to err on the side of a too-small LR, which is problematic e.g. when stitching and training 'to convergence'. A too small LR up front may simply result in the system not learning at all.
- we haven't thoroughly tested the behavior of the LR Finder when the optimizer is something other than SGD. In other projects, we went ahead and used the LR finder with the Adam optimizer and it seemed to work most of the time. However, any adaptive optimizer (such as Adam) will have some strange behavior where bigger LRs are influenced by the history of gradients / updates from the smaller LRs. It is unclear how or whether this will affect things. Most likely our best solution is to have the LRFinder instantiate its own momentum-less SGD optimizer rather than have the user pass in an optimizer?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels