Slow gradient calculation for convnets when loss include both jvp and vjp #15584

sorenhauberg · 2023-04-13T10:27:36Z

sorenhauberg
Apr 13, 2023

We are working on some code that requires us to compute the gradient of a loss, including $J J^T \epsilon$, where $J$ is the Jacobian of a neural network, and $\epsilon$ can be any vector. The actual loss is a bit more complex, but the above demonstrates the issue we face.

Jax makes it wonderfully easy to evaluate this loss:

def my_function(model_fn, x, params, output_dim: int, key: jax.random.PRNGKeyArray, S: int = 50):
    eps = jax.random.normal(key, shape=(S, output_dim))

    def gvp_fn_(x):
        lmbd = lambda p: model_fn(p, x)
        _, f_l = jax.linearize(lmbd, params)
        f_lt_tuple = jax.linear_transpose(f_l, params)
        
        def _gvp_fn(eps):    
            Jte = f_lt_tuple(eps)[0]
            JJte = f_l(Jte)
            return JJte
        
        JJte = jax.vmap(_gvp_fn)(eps)
        return JJte.reshape(-1).sum()
    
    return jax.vmap(gvp_fn_)(x).sum()

For neural networks with linear layers, it is quite fast to evaluate this loss as well as its gradient. However, when using convolutional networks, we see drastic slowdowns. In particular, I get times like

Model LinearNet has 109386 parameters.
epoch=1, time=0.00380s
Model ConvNet1 has 31530 parameters.
epoch=1, time=0.01301s
Model ConvNet2 has 10330 parameters.
epoch=1, time=1.49663s
Model ConvNet3 has 7370 parameters.
epoch=1, time=1.67530s

where the three ConvNets have 1, 2, and 3 convolution layers, respectively. Note that if I only want to evaluate the loss (not its gradient), then all models are fast.

I'm a bit stuck on how to proceed and would appreciate any pointers on how to get around this slowdown. Is this a fundamental issue with convolutions, is it a matter of implementation, or is it a bug in Jax that produces a slow code path? Any suggestions as to how I can make the code faster?

A complete MWE is available on Google Colab here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow gradient calculation for convnets when loss include both jvp and vjp #15584

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Slow gradient calculation for convnets when loss include both jvp and vjp #15584

Uh oh!

sorenhauberg Apr 13, 2023

Replies: 0 comments

sorenhauberg
Apr 13, 2023