Get nan when calculating partial derivatives of a smooth function use jax.jacfwd #7313

smao-astro · 2021-07-17T08:01:08Z

smao-astro
Jul 17, 2021

Hi,

As the title, I get nan when calculating the partial derivative of the smooth 1D function below

Code to reproduce

import jax
import jax.numpy as jnp

ring_center = 1.2
ring_width = 0.1
nu = 1

def f(r):
    part = ring_width**2*(jnp.sqrt(2.)+2.*jnp.exp((r - ring_center)**2/(2*ring_width**2))*jnp.sqrt(jnp.pi)*ring_width)
    return 3*nu*(2.*jnp.sqrt(2.)*r**2-2*jnp.sqrt(2.)*r*ring_center-part)/(2.*r*part)

key = jax.random.PRNGKey(123)
inputs = jax.random.uniform(key, shape=(100, 1), minval=jnp.asarray([0.4]), maxval=jnp.asarray([2.5]))

nan_index = jnp.isnan(jax.vmap(jax.jacfwd(f))(inputs)).flatten()
print(nan_index)

print(inputs[nan_index])
print(f(inputs[nan_index]))

I find two change would avoid the nan

Changing from jax.jacfwd to jax.jacrev.
Or
Add

from jax.config import config
config.update("jax_enable_x64", True)

I would like to know if this is the expected behavior, and if yes, why.

Thanks!

Answered by lamflokas

Jul 18, 2021

This is expected behavior because the gradient calculation here can be numerically problematic. Remember that (f/g)' = (f'g - g'f/)g^2. The g^2 here grows to a very large number when the input is close to 2.2 and for the standard 32 bit precision floats, we indeed get an inf value. At the same time the numerator of the derivative has problems of its own, since we end having to evaluate inf-inf which gives nan. Unless we are very careful about how we apply the chain rule, we can end up getting a nan.

Increasing the precision to 64 bits is sufficient to sidestep the problem for the given range because 64 bits are sufficient to represent these values (The problem reemerges if we extend the i…

View full answer

lamflokas · 2021-07-18T14:02:56Z

lamflokas
Jul 18, 2021

This is expected behavior because the gradient calculation here can be numerically problematic. Remember that (f/g)' = (f'g - g'f/)g^2. The g^2 here grows to a very large number when the input is close to 2.2 and for the standard 32 bit precision floats, we indeed get an inf value. At the same time the numerator of the derivative has problems of its own, since we end having to evaluate inf-inf which gives nan. Unless we are very careful about how we apply the chain rule, we can end up getting a nan.

Increasing the precision to 64 bits is sufficient to sidestep the problem for the given range because 64 bits are sufficient to represent these values (The problem reemerges if we extend the input range to 4).

Now you also noted that jacrev does not suffer from the inf problem. Although jacfwd and jacrev yield the same results under infinite precision arithmetic, this no longer holds under finite precision. jacrev and jacfwd order operations differently and thus one algorithm may avoid these inf operations. Even if jacrev does not output nan, jacrev is not numerically accurate either. It predicts a derivative of -70 around 2.2 while the function is close to being constant.

These numerical instability problems are known. The typical way to solve them is to rewrite our function in an equivalent way as to remove this instability. For example, for your function we could replace your f function with

def f(r):
    part = ring_width**2*(jnp.sqrt(2.)+2.*jnp.exp((r - ring_center)**2/(2*ring_width**2))*jnp.sqrt(jnp.pi)*ring_width)
    return 3*nu*(2.*jnp.sqrt(2.)*r**2-2*jnp.sqrt(2.)*r*ring_center)/(2.*r*part) - 3*nu/(2.*r)

1 reply

smao-astro Jul 19, 2021
Author

Hi @lamflokas ,

Thank you very much for your detailed explanation!

You are right, this is numerical issue and we could reformat the math equation to avoid it.

I adopted your suggestion and reformat the function to

def ic_fn(r):
    inverse_exp_part = jnp.exp(-((r - ring_center) ** 2) / (2 * ring_width ** 2))
    part = ring_width ** 2 * (
        jnp.sqrt(2.0) * inverse_exp_part + 2.0 * jnp.sqrt(jnp.pi) * ring_width
    )
    return (
        3
        * nu
        * (
            (2.0 * jnp.sqrt(2.0) * r ** 2 - 2 * jnp.sqrt(2.0) * r * ring_center)
            * inverse_exp_part
            - part
        )
        / (2.0 * r * part)
    )

and get rid of the numerical issue when calculating first order and second order derivatives.

Thank you very much again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get nan when calculating partial derivatives of a smooth function use jax.jacfwd #7313

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Get nan when calculating partial derivatives of a smooth function use jax.jacfwd #7313

Uh oh!

smao-astro Jul 17, 2021

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

lamflokas Jul 18, 2021

Uh oh!

smao-astro Jul 19, 2021 Author

smao-astro
Jul 17, 2021

Replies: 1 comment 1 reply

lamflokas
Jul 18, 2021

smao-astro Jul 19, 2021
Author