Set GradScaler's init_scale #13061
-
Hi all, I'm using Lightning to train a model which encounters large gradient updates early in training. The default init_scale of 2**16 causes the gradients to overflow to inf in certain layers, which leads to NaNs, which leads to various kinds of suboptimal behavior. But I'd still like to use FP16 for the larger batch sizes. Writing out the training loop by hand and passing GradScaler a smaller init_scale avoids this problem. Is there a way to pass this value through the Trainer class? The documentation doesn't mention a way to customize the scaler. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I think I figured this out - you need to pass a plugin to the trainer class that implements mixed precision, and give the plugin your preferred scaler. Yaml:
|
Beta Was this translation helpful? Give feedback.
I think I figured this out - you need to pass a plugin to the trainer class that implements mixed precision, and give the plugin your preferred scaler.
Yaml: