Efficient gradient computation with masks #18490

mohamad-amin · 2023-11-10T23:49:14Z

mohamad-amin
Nov 10, 2023

Hey,

Consider I have the following script which computes the loss and its gradient on an arbitrary model only on a portion of data identified by the mask matrix, whose entries are 1 or 0. In this case, the entries (i, j) where mask[i, j] = 0 don't contribute anything to the gradient. I'm wondering if there's an efficient way of avoiding computing gradients for these entries. For instance, if mask is highly sparse (mask.sum() << np.prod(mask.shape)) the gradient computation could be much more efficient if we ignore the zero gradients.

def criterion(out, y):
  return ((out - y) ** 2).mean()

def loss(p, data):
  x, y, mask = data  # shapes: n x d, n x c, n x c
  out = model(p, data)
  return criterion(out * mask, y * mask)

loss_and_grad_fn = jax.value_and_grad(loss)

L, dL = loss_and_grad_fn(p, data)

Any help or feedback would be appreciated. Thanks!

ArvinSKushwaha · 2024-06-05T14:25:46Z

ArvinSKushwaha
Jun 5, 2024

Here's an example where I attempt to do that. I don't know how greatly this will end up being optimized, but this does what you want by stopping gradient flow internally:

import jax

from jax import numpy as np
from jaxtyping import Array, Scalar

key = jax.random.key(0)

x = np.linspace(1, 3, 5)

(indices,) = np.indices(x.shape)
mask = jax.random.choice(key, indices, shape=(4,), replace=False)

def model(x: Array) -> Array:
    return x**2


def cost(x: Array) -> Scalar:
    return np.sum(x**2)


def f(x: Array, mask: Array | None = None) -> Scalar:
    if mask is not None:
        x = x.at[mask].set(jax.lax.stop_gradient(x[mask]))
    return cost(model(x))


print(jax.value_and_grad(f)(x))
print(jax.value_and_grad(f)(x, mask))

Output:

(Array(142.125, dtype=float32), Array([  4. ,  13.5,  32. ,  62.5, 108. ], dtype=float32))
(Array(142.125, dtype=float32), Array([ 0.,  0., 32.,  0.,  0.], dtype=float32))

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient gradient computation with masks #18490

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Efficient gradient computation with masks #18490

Uh oh!

Uh oh!

mohamad-amin Nov 10, 2023

Replies: 1 comment

Uh oh!

ArvinSKushwaha Jun 5, 2024

mohamad-amin
Nov 10, 2023

ArvinSKushwaha
Jun 5, 2024