Calculating the diagonal of Hessian efficiently #15450

nalzok · 2023-04-07T01:27:24Z

nalzok
Apr 7, 2023

To make this question more concrete, let's consider an example neural network in Flax

from typing import Sequence

import jax
import jax.numpy as jnp
import flax.linen as nn


class MLP(nn.Module):
    features: Sequence[int]

    @nn.compact
    def __call__(self, x):
        for feat in self.features[:-1]:
            x = nn.relu(nn.Dense(feat)(x))
        x = nn.Dense(self.features[-1])(x)
        return x


def compute_loss(variables, batch, label):
    output = model.apply(variables, batch)
    loss = jnp.sum((output - label)**2)
    return loss


model = MLP([8, 1])

Suppose we are training the neural network on some data. For the purpose of this question, batch and label are considered constant, so let's define a helper function f. We calculate the loss as follows

batch = jnp.ones((32, 10))
label = jnp.ones((32))
variables = model.init(jax.random.PRNGKey(0), batch)
f = lambda param: compute_loss(param, batch, label)
loss = f(variables)

We can calculate the gradient with jax.grad.

>>> gradient = jax.grad(f)(variables)
>>> print(jax.tree_map(jnp.shape, gradient))
FrozenDict({
    params: {
        Dense_0: {
            bias: (8,),
            kernel: (10, 8),
        },
        Dense_1: {
            bias: (1,),
            kernel: (8, 1),
        },
    },
})

Similarly, we can calculate the Hessian with jax.hessian.

>>> hessian = jax.hessian(f)(variables)
>>> print(jax.tree_map(jnp.shape, hessian))
FrozenDict({
    params: {
        Dense_0: {
            bias: FrozenDict({
                params: {
                    Dense_0: {
                        bias: (8, 8),
                        kernel: (8, 10, 8),
                    },
                    Dense_1: {
                        bias: (8, 1),
                        kernel: (8, 8, 1),
                    },
                },
            }),
            kernel: FrozenDict({
                params: {
                    Dense_0: {
                        bias: (10, 8, 8),
                        kernel: (10, 8, 10, 8),
                    },
                    Dense_1: {
                        bias: (10, 8, 1),
                        kernel: (10, 8, 8, 1),
                    },
                },
            }),
        },
        Dense_1: {
            bias: FrozenDict({
                params: {
                    Dense_0: {
                        bias: (1, 8),
                        kernel: (1, 10, 8),
                    },
                    Dense_1: {
                        bias: (1, 1),
                        kernel: (1, 8, 1),
                    },
                },
            }),
            kernel: FrozenDict({
                params: {
                    Dense_0: {
                        bias: (8, 1, 8),
                        kernel: (8, 1, 10, 8),
                    },
                    Dense_1: {
                        bias: (8, 1, 1),
                        kernel: (8, 1, 8, 1),
                    },
                },
            }),
        },
    },
})

The question is how to calculate the diagonal of Hessian efficiently. In particular, if we concatenate all parameters into a huge vector of size $N$, then the Hessian will be a matrix of size $N \times N$, whose diagonal is a vector of size $N$, and this is the desired "Hessian diagonal". Essentially, for each scalar component in the parameter vector, I want to fix all other parameters and calculate the second-order derivative of the loss function with respect to that particular component. In addition, I want the Hessian diagonal to have the same shape as the parameters and gradient, i.e.

jax.tree_map(jnp.shape, gradient) == jax.tree_map(jnp.shape, hessian_diagonal)

Technically I can form the full Hessian and extract its diagonal, but that is going to have quadratic time complexity with respect to the number of parameters. Is there a more efficient solution in linear time?

jakevdp · 2023-04-07T01:35:50Z

jakevdp
Apr 7, 2023
Maintainer

There's not any easy way to do this in general, but it may be possible for special cases; see the previous discussion here: #3801

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculating the diagonal of Hessian efficiently #15450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Calculating the diagonal of Hessian efficiently #15450

Uh oh!

nalzok Apr 7, 2023

Replies: 1 comment

Uh oh!

jakevdp Apr 7, 2023 Maintainer

nalzok
Apr 7, 2023

jakevdp
Apr 7, 2023
Maintainer