Skip to content

Gradient Clipping with mix precision in case of NaN loss #11413

Discussion options

You must be logged in to vote

But the doc say gradient clipping should not be used with mixed precision.

You totally can, that's saying that any scaling applied by 16bit precision training will be undone before clipping the gradients.

Which means you do not need to worry about changing the gradient clipping value with vs without precision=16

i dont know what value of gradient clipping should i use?

Nobody does :P
Try some experiments and find out!

'If using Automatic Mixed Precision (AMP), the gradients will be unscaled before logging them'

Same thing as I explained above. It's just a technical detail, you do not need to worry about it

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by carmocca
Comment options

You must be logged in to vote
1 reply
@DanTremonti
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment