Integration between a JAX library and a foreign model #9012

gianlucadetommaso · 2021-12-19T20:24:46Z

gianlucadetommaso
Dec 19, 2021

Suppose that we want to train in JAX an ML model $F(\theta, x)$ written in another framework, say PyTorch. Given a loss function $\sum_i L(\theta, D_i)$ over a jax.numpy array of parameters $\theta$ and data $D_i=(x_i, y_i)$, one could write

$$ \nabla_\theta \sum_i L(\theta, D_i) = \sum_i \nabla_\theta F(\theta, x_i)^\top \nabla_F L(\theta, D_i),$$

where the Jacobian $\nabla_\theta F(\theta, x_i)$ is written in PyTorch and passed as a numpy array, while $\nabla_F L(\theta, D_i)$ is directly computed via jax.grad.

More generally, this is useful not only for training, but for many cases of integration between a JAX library and a foreign model, where we do not want to force the user to translate their model into JAX.

The problem is that computing the Jacobian on the right-hand-side of the equation above is extremely expensive compared to a situation where I could directly compute the left-hand-side via standard jax.grad. The following code is a proof-of-concept for a classification model, where I compute the gradient of the loss function first with the left-hand-side and then with the right-hand-side methods. On my laptop, I get around 0.15 seconds to elapse the computation with the left-hand-side method and around 360 seconds with the right-hand-side one.

Why this huge difference? Is there any way to accelerate the computation on the right-hand-side to be comparable to the direct gradient computation on the left?

import jax.numpy as np
import jax.scipy as sp
from jax import random, jit, vmap, jacrev, grad
from jax.flatten_util import ravel_pytree
import flax.linen as nn
import time

class MLP(nn.Module):
    @nn.compact
    def __call__(self, x: np.array):
        x = nn.Dense(16, name='l1')(x)
        x = nn.relu(x)
        x = nn.Dense(16, name='l2')(x)
        x = nn.relu(x)
        x = nn.Dense(10, name='l3')(x)
        return x

mlp = MLP()
    
def log_output_pdf(forward: np.array, y: np.array):
    return np.sum(y * forward, -1) - sp.special.logsumexp(forward, -1)
    
def log_likelihood_fn(params: np.array, data: tuple):
    x, y = data
    f = mlp.apply(params, x)
    return np.sum(log_output_pdf(f, y))

# define forward and backward pass of the model
forward_fn = jit(lambda p, x: mlp.apply(unravel(p), x))
jacobian_fn = jit(lambda p, x: vmap(lambda _x: jacrev(lambda _p: forward_fn(_p, _x))(p))(x))

# define some artificial data and ravelled parameters
X, y = (np.ones((60000, 784)), np.ones((60000, 10)))
ravelled, unravel = ravel_pytree(mlp.init(random.PRNGKey(0), np.ones((1, X.shape[1]))))

# all-the-way JAX differentiation
print('All-the-way JAX differentiation...')
start_time = time.time()
print(grad(lambda p: log_likelihood_fn(unravel(p), (X, y)))(ravelled))
print('... elapsed in {:.2f}s.'.format(time.time() - start_time))

# differentiation when we know only forward and backward pass of the model
print('Differentiation with know forward and backward pass...')
start_time = time.time()
forward = forward_fn(ravelled, X)
jac_forward = jacobian_fn(ravelled, X)
grad_log_output = jit(vmap(lambda f, y: grad(lambda _f: log_output_pdf(_f, y))(f)))(forward, y)
print(jit(vmap(lambda jac, grad: np.matmul(jac.T, grad)))(jac_forward, grad_log_output).sum(0))
print('... elapsed in {:.2f}s.'.format(time.time() - start_time))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integration between a JAX library and a foreign model #9012

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Integration between a JAX library and a foreign model #9012

Uh oh!

Uh oh!

gianlucadetommaso Dec 19, 2021

Replies: 0 comments

gianlucadetommaso
Dec 19, 2021