Unclear why gradients are NaN when including a sigmoid #15617

Peter-Vincent · 2023-04-15T13:43:27Z

Peter-Vincent
Apr 15, 2023

I have the following function

def h_soft_z(self,u_seed,indexes,log_fac_indexes,rates,lamb):
        log_prob = indexes * jnp.log(rates) - rates - log_fac_indexes
        gumbel_samp = (-jnp.log(-jnp.log(u_seed)) + log_prob)/lamb
        w = 1/(1 + jnp.exp(-gumbel_samp))
        return indexes @ w

And I want to differentiate it with respect to the rates parameter,

grad_h_soft_z = grad(self.h_zoft_z,3)

But when I apply grad_h_soft_z I get NaN unless I set the lamb parameter to sufficiently high. I don't understand why at small values of lamb this gradient becomes NaN. I've included some sample values below

indexes = jnp.array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,13., 14.],dtype=jnp.float32)
log_fac_indexes = jnp.array([ 0.       ,  0.       ,  0.6931125,  1.7917458,  3.1780472, 4.787488 ,  6.579249 ,  8.52516  , 10.604602 , 12.8018265, 15.104413 , 17.502308 , 19.987215 , 22.552162 , 25.191223 ], dtype=jnp.float32)
u_seed = jnp.array([0.8444021 , 0.63748217, 0.62056625, 0.52718556, 0.01200056, 0.14619815, 0.6288456 , 0.14960563, 0.29536152, 0.3010112 ,0.4296627 , 0.64951324, 0.55469537, 0.35847032, 0.14953923], dtype=jnp.float32)
rates =jnp.array(0.15433456, dtype=jnp.float32)

and for these values the magic value of lamb is lamb = 0.588 (to 3 d.p).

If i remove the sigmoid from the function then differentiating is completely fine, at any value of lamb greater than 0.

Answered by mattjj

Apr 15, 2023

Thanks for the question!

The issue is a numerical stability one in the expression 1/(1 + jnp.exp(-gumbel_samp)): setting lamb=0.5, we get values of gumbel_samp up to -104.296745, which in turn means we're evaluating something like jnp.exp(104.296745), which gives us floating point's infinity. That inf appears in a Jacobian coefficient, and so leads to the nan when we multiply it by a zero cotangent.

(Tangentially: I found that by using the with jax.debug_nans() context manager (and also with jax.debug_infs()), which raised an exception as soon as an operation produced a nan (or inf), along with a post-mortem debugger. That showed me exactly where things were going wrong, but I just realiz…

View full answer

mattjj · 2023-04-15T21:46:38Z

mattjj
Apr 15, 2023
Maintainer

Thanks for the question!

The issue is a numerical stability one in the expression 1/(1 + jnp.exp(-gumbel_samp)): setting lamb=0.5, we get values of gumbel_samp up to -104.296745, which in turn means we're evaluating something like jnp.exp(104.296745), which gives us floating point's infinity. That inf appears in a Jacobian coefficient, and so leads to the nan when we multiply it by a zero cotangent.

(Tangentially: I found that by using the with jax.debug_nans() context manager (and also with jax.debug_infs()), which raised an exception as soon as an operation produced a nan (or inf), along with a post-mortem debugger. That showed me exactly where things were going wrong, but I just realized it's no longer a great way for users to debug nans in the backward pass because (1) with jax_traceback_filtering='off' it required stepping through a bunch of JAX-internal associated with the backward pass that users wouldn't be familiar with, and (2) with jax_traceback_filtering='auto' it hid the entire backward pass from the debugger. I'd like to fix that, so it's easier to debug the backward pass using debug_nans!)

Back to how to fix this particular issue: basically, use jax.lax.logstic (aka jax.nn.sigmoid, aka jax.scipy.special.expit). That is backed by a primitive with a numerically stable differentiation rule, and we don't get the same numerical stability when we differentiate the expression 1/(1 + jnp.exp(-gumbel_samp)) as a composition of functions.

See this section of the custom_jvp docs for a similar example worked out in more detail.

Using w = jax.lax.logistic(gumbel_samp) made lamb=0.5 work for me in your code.

What do you think?

1 reply

Peter-Vincent Apr 16, 2023
Author

hmm, good find. I probably should have noticed that infinity arriving with low values of lamb! I had been running with config.update("jax_debug_nans",False) but got lost in what seemed like an endless stack of functions that I didn't recognise!

Thanks for finding this and suggesting a fix, that did the trick! I'll have a look at those resources you sent so hopefully I can avoid these kind of issues next time around. You may well have saved my PhD, so thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unclear why gradients are NaN when including a sigmoid #15617

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unclear why gradients are NaN when including a sigmoid #15617

Uh oh!

Uh oh!

Peter-Vincent Apr 15, 2023

Replies: 1 comment · 1 reply

Uh oh!

mattjj Apr 15, 2023 Maintainer

Uh oh!

Peter-Vincent Apr 16, 2023 Author

Peter-Vincent
Apr 15, 2023

Replies: 1 comment 1 reply

mattjj
Apr 15, 2023
Maintainer

Peter-Vincent Apr 16, 2023
Author