how to efficiently compute value, grad, and hvp function #6268

quattro · 2021-03-29T18:55:22Z

quattro
Mar 29, 2021

Hi,

I've implemented some custom code to perform trust region conjugate gradient optimization using JAX/LAX primitives with some success, however I've based it off analytic gradients and hessians for my specific problem. I'd love to start porting this to a general interface that mimics more closely the minimize_* API within JAX, but have a question regarding the efficient HVP example in the tutorial.

Namely, I'd like to have a function value_and_grad_and_hvp that behaves similarly to the value_and_grad , but also returns the hessian vector product function hvp. The reason for this is that while the tutorial provides an excellent example of how to compute the hvp function, it would require multiple calls to grad which internally re-calls value_and_grad. This likely isn't a huge bottleneck, but for my particular application I am trying to squeeze as much as I can out of the implementation due to a massive number of optimizations that need to be performed.

I've tried to wrap my own function that computes value, grad, and hvp directly from jvp and vjp calls, but I'm having trouble figuring out why 1) the dimensionality is not lining up and 2) why the HVP values are incorrect (likely due to why 1 is off).

I've re-coded up a toy example to illustrate what I'm trying to accomplish and what the contrast is. Any help or suggestions would be greatly appreciated.

thanks!

import jax 
import jax.numpy as jnp 
from jax import grad, hessian, jvp, value_and_grad, partial

alpha = jnp.ones(2) # some jnp.ndarray
test = jnp.ones(2) * 2 

# random data
Q = jnp.array([[1., 0.5], [0.5, 1.]])
b = 3. * jnp.ones(2)
c = 5.

# logic removed for now, but computes a scalar function on Rn -> R
def func(alpha: jnp.ndarray):
    return 0.5 * alpha.T @ Q @ alpha + b.T @ alpha + c 

def _hvp_1(g_f, primals, tangents):
    return jvp(g_f, (primals,), (tangents,))[1]

def _hvp_2(f, primals, tangents):
    return jvp(grad(f), (primals,), (tangents,))[1]

# this does not compute what I'd like
val_1, grad_f_ = jax.vjp(func, alpha)
grad_f_1 = lambda x: grad_f_(x)[0]
grad_val_1 = grad_f_1(jnp.ones(()))
hess_vp_1 = partial(_hvp_1, grad_f_1, alpha)
print(f"grad value 1 = {grad_val_1}")
print(f"not hessvp = {hess_vp_1(test)}")

# grad value 1 = [4.5 4.5]
# not hessvp = [[9. 9.]
#  [9. 9.]]

# this returns what I need, but does extra amount of work of calling `value_and_grad` twice
# once for this call here, and another in the `grad` call inside `_hvp_2`
val_2, grad_val_2 = value_and_grad(func)(alpha)
hess_vp_2 = partial(_hvp_2, func, alpha)
print(f"grad value 2 = {grad_val_2}")
print(f"hessvp = {hess_vp_2(test)}")

# grad value 2 = [4.5 4.5]
# hessvp = [3. 3.]

froystig · 2021-04-14T20:06:28Z

froystig
Apr 14, 2021
Maintainer

Inlining expressions by hand:

hess_vp_1 = jvp(lambda x: vjp(func, alpha)[1](x)[0],
                (alpha,), (test,))[1]
hess_vp_2 = jvp(grad(func)),
                (alpha,), (test,))[1]

But grad(func) has a different meaning than the preceding expression, along the lines of:

grad(f)(z) == vjp(f, z)[1](1.)[0]

In words: grad(f)(z) is the product of the transposed Jacobian—at the primal point z—with the scalar 1. Your example seems to be using alpha both as a primal linearization point and again multiplying it against the Jacobian.

(This is only a guess, based on a quick glance.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to efficiently compute value, grad, and hvp function #6268

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

how to efficiently compute value, grad, and hvp function #6268

Uh oh!

Uh oh!

quattro Mar 29, 2021

Replies: 1 comment

Uh oh!

froystig Apr 14, 2021 Maintainer

quattro
Mar 29, 2021

froystig
Apr 14, 2021
Maintainer