[Training Log 2022-10-23] We modify the computation of attention scores to stabilize the training. #213
zh-zheng
announced in
Training Logs 训练日志
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
CPM-Live Training Log (October, 23)
Time: October, 23 2022 19:00
Recorder: @zh-zheng
Loss
Completed Data
Average Grad Norm
Progress
Comment
We observed that the training loss became NaN this morning, and we stabilized the training by modifying the computation of attention scores. It seems to be working so far.
Beta Was this translation helpful? Give feedback.
All reactions