The forward derivative of tanh #19052

dinesh110598 · 2023-12-19T16:57:27Z

dinesh110598
Dec 19, 2023

Using jaxlib- 0.4.23+cuda12.cudnn89

I tried computing the derivative of jnp.tanh like this:

x = 2.34567
f = jnp.tanh
f_x, df_dx = jax.value_and_grad(f)(x)
print(df_dx, 1 - f_x**2)
print("Difference: ", df_dx - (1 - f_x**2))

with the following output:

0.036033377 0.036033392
Difference:  -1.4901161e-08

Why are the exact derivatives not matching? Is it the jax.value_and_grad function somehow implementing a gradient that's numerically more precise compared to 1 - tanh(x)**2 ?

Answered by jakevdp

Dec 19, 2023

If you're ever curious about the exact sequence of operations that are used to compute an automatic gradient (or any other operation), you can see them using make_jaxpr:

import jax
import jax.numpy as jnp

x = 2.34567
f = jnp.tanh

def df1(x):
  return jax.grad(f)(x)

def df2(x):
  return 1 - f(x)**2

print(jax.make_jaxpr(df1)(x))
# { lambda ; a:f32[]. let
#     b:f32[] = tanh a
#     c:f32[] = sub 1.0 b
#     d:f32[] = mul 1.0 c
#     e:f32[] = mul d b
#     f:f32[] = add_any d e
#   in (f,) }

print(jax.make_jaxpr(df2)(x))
# { lambda ; a:f32[]. let
#     b:f32[] = tanh a
#     c:f32[] = integer_pow[y=2] b
#     d:f32[] = sub 1.0 c
#   in (d,) }

These are two different ways of computing …

View full answer

jakevdp · 2023-12-19T17:24:29Z

jakevdp
Dec 19, 2023
Maintainer

If you're ever curious about the exact sequence of operations that are used to compute an automatic gradient (or any other operation), you can see them using make_jaxpr:

import jax
import jax.numpy as jnp

x = 2.34567
f = jnp.tanh

def df1(x):
  return jax.grad(f)(x)

def df2(x):
  return 1 - f(x)**2

print(jax.make_jaxpr(df1)(x))
# { lambda ; a:f32[]. let
#     b:f32[] = tanh a
#     c:f32[] = sub 1.0 b
#     d:f32[] = mul 1.0 c
#     e:f32[] = mul d b
#     f:f32[] = add_any d e
#   in (f,) }

print(jax.make_jaxpr(df2)(x))
# { lambda ; a:f32[]. let
#     b:f32[] = tanh a
#     c:f32[] = integer_pow[y=2] b
#     d:f32[] = sub 1.0 c
#   in (d,) }

These are two different ways of computing what would be the same value in real-valued arithmetic, but in floating point the errors accumulate differently. The results you're seeind differ by about 1 part in $10^8$, which is typical of rounding error for float32 computations.

If you want to see how this autodiff rule is defined in the code, you can find it here: https://github.com/google/jax/blob/c172be137911ee77b8f1327b98d2b0c0f8b459ea/jax/_src/lax/lax.py#L1800-L1801

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The forward derivative of tanh #19052

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

The forward derivative of tanh #19052

Uh oh!

dinesh110598 Dec 19, 2023

Replies: 1 comment

Uh oh!

Uh oh!

jakevdp Dec 19, 2023 Maintainer

dinesh110598
Dec 19, 2023

jakevdp
Dec 19, 2023
Maintainer