Skip to content
Discussion options

You must be logged in to vote

Your learning rate is now too low. For your convenience, here's a few things to keep in mind that additionally affect the learning rate's impact:

  1. Network Rank - Larger networks, all else equal, need a lower learning rate to be stable. This relationship seems to hold at scale, i.e LoRA usually need learning rates ~10x higher than the original model.
  2. Network Alpha - Is literally just a scalar on the effective learning rate, but consequently any suggested learning rate from anyone else is completely meaningless unless they also provide this parameter and the rank. Your chosen learning rate is effectively multiplied by (alpha/rank) to get your "real" learning rate.
  3. Optimizer - Valid learning…

Replies: 2 comments 8 replies

Comment options

You must be logged in to vote
5 replies
@gabriel-oj
Comment options

@gabriel-oj
Comment options

@madman404
Comment options

Answer selected by O-J1
@gabriel-oj
Comment options

@gabriel-oj

This comment was marked as off-topic.

Comment options

You must be logged in to vote
3 replies
@mx
Comment options

mx Jul 19, 2024
Collaborator

@ZeroCool22
Comment options

@O-J1
Comment options

O-J1 Sep 24, 2024
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
7 participants