Set GradScaler's init_scale #13061

wwbrannon · 2022-05-13T00:04:11Z

wwbrannon
May 13, 2022

Hi all,

I'm using Lightning to train a model which encounters large gradient updates early in training. The default init_scale of 2**16 causes the gradients to overflow to inf in certain layers, which leads to NaNs, which leads to various kinds of suboptimal behavior. But I'd still like to use FP16 for the larger batch sizes.

Writing out the training loop by hand and passing GradScaler a smaller init_scale avoids this problem. Is there a way to pass this value through the Trainer class? The documentation doesn't mention a way to customize the scaler.

Answered by wwbrannon

May 13, 2022

I think I figured this out - you need to pass a plugin to the trainer class that implements mixed precision, and give the plugin your preferred scaler.

Yaml:

trainer:
    precision: 16
    amp_backend: 'native'
    amp_level: null

    plugins:
        - class_path: pytorch_lightning.plugins.precision.NativeMixedPrecisionPlugin
          init_args:
              precision: 16
              device: 'cuda'
              scaler:
                  class_path: torch.cuda.amp.GradScaler
                  init_args:
                      # the default scale of 2**16 overflows early in training
                      # and makes the gradient unstable
                      init_scale: 256

View full answer

wwbrannon · 2022-05-13T06:05:42Z

wwbrannon
May 13, 2022
Author

I think I figured this out - you need to pass a plugin to the trainer class that implements mixed precision, and give the plugin your preferred scaler.

Yaml:

trainer:
    precision: 16
    amp_backend: 'native'
    amp_level: null

    plugins:
        - class_path: pytorch_lightning.plugins.precision.NativeMixedPrecisionPlugin
          init_args:
              precision: 16
              device: 'cuda'
              scaler:
                  class_path: torch.cuda.amp.GradScaler
                  init_args:
                      # the default scale of 2**16 overflows early in training
                      # and makes the gradient unstable
                      init_scale: 256

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set GradScaler's init_scale #13061

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Set GradScaler's init_scale #13061

Uh oh!

wwbrannon May 13, 2022

Replies: 1 comment

Uh oh!

wwbrannon May 13, 2022 Author

wwbrannon
May 13, 2022

wwbrannon
May 13, 2022
Author