Diagonal of Hessians with custom VJP #8313

clee1994 · 2021-10-20T19:18:17Z

clee1994
Oct 20, 2021

Hey everyone,

This question is actually twofold. Generally, I want to implement a quantization scheme that utilizes hessian information (e.g. https://arxiv.org/pdf/1911.03852.pdf or https://arxiv.org/pdf/2104.00903.pdf). And I am facing two problems right now:

When using the rounding function for quantization I use straight-through-estimators (see code below), however, they cause an error when computing the diagonal of the hessian ("can't apply forward-mode autodiff (jvp) to a custom_vjp function."). Is there any way around that in JAX so that I can keep custom_vjps?
I not only want to have hessian diagonals for the weights/params but also for intermediate activations (e.g. x1 and x2 in the code below). In pytorch this can be done somehow like this https://github.com/cvlab-yonsei/EWGS/blob/56c654cb893d53563eb352dd591d1450c34bdd15/ImageNet/utils.py#L161 (retaining a graph and then you can arbitrarily call the grad functions on matrices in the graph?). Any idea how that would be realizable in JAX?

At the bottom of this post, you can find some code illustrating the challenges I am facing. Any help or hint is truly appreciated, JAX so far has made my life a lot easier and I feel that those problems should be easily solvable but I somehow haven't found a way to do it yet.

Many thanks and go JAX!
Clemens

import jax
import jax.numpy as jnp

from typing import Any, Callable

from jax.flatten_util import ravel_pytree


Array = jnp.ndarray

# Rounding with straight-through-estimator


@jax.custom_vjp
def roundpass(x):
  return jnp.round(x)


def roundpass_fwd(x):
  return roundpass(x), (None,)


def roundpass_bwd(res, g):
  return (g,)


roundpass.defvjp(roundpass_fwd, roundpass_bwd)


# simple NN with two layers
rng = jax.random.PRNGKey(0)
rng_p1, rng_p2, rng_p3, rng_p4 = jax.random.split(rng, 4)
inputs = jax.random.normal(rng_p1, (10, 8))
params = [
    jax.random.normal(rng_p2, (8, 9)),
    jax.random.normal(rng_p3, (9, 11))
]
targets = jax.random.normal(rng_p4, (10, 11))


def loss_fn(params, x):

  x1 = jnp.dot(x, params[0])
  xi = roundpass(x1)  # replace this line with xi = x1 for functioning code
  x2 = jnp.dot(xi, params[1])

  return jnp.sum((x2-targets)**2)


def loss_wrt_params(x): return loss_fn(x, inputs)

# compute diagonal of hessian based on https://github.com/deepmind/optax/blob/master/optax/_src/second_order.py


def ravel(p: Any) -> Array:
  return ravel_pytree(p)[0]


_, unravel_fn = ravel_pytree(params)
vs = jnp.eye(ravel(params).size)


def comp(v): return jnp.vdot(
    v, ravel(jax.jvp(jax.grad(loss_wrt_params), [params], [unravel_fn(v)])[1]))


hess_diag_wrt_weights = jax.vmap(comp)(vs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diagonal of Hessians with custom VJP #8313

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Diagonal of Hessians with custom VJP #8313

Uh oh!

Uh oh!

clee1994 Oct 20, 2021

Replies: 0 comments

clee1994
Oct 20, 2021