Replies: 1 comment 1 reply
-
In the latest deepmd-kit, one does not need to set decay rate, but to set stop_lr instead. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
With the help of Horovod, the real batch size is 8 when batch_size is set to 2 in the input file and we launch 4 workers. According to the Linear Scaling Rule in the attached paper Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour: When the minibatch size is multiplied by k, multiply the learning rate by k.
Then how should we manually change the value of decay steps?
Why the decay steps should be reduced to 1/2 of steps in the above case?
In this relationship lr(t) = start_lr * decay_rate ^ ( t / decay_steps ), if t / decay_steps always equals to 200, then we could not possibly multiply the learning rate by k without changing decay_rate...
Beta Was this translation helpful? Give feedback.
All reactions