Efficiently differentiate through lax.scatter ops? #13051

stefanozampini · 2022-10-31T09:00:52Z

stefanozampini
Oct 31, 2022

Consider this code, that computes the gradient wrt a subset of parameters.

import jax
import jax.numpy as jnp
import jax.flatten_util

def full(p):
   return p['a']**2 + p['b']**3

def subd(sp, fp): # subfunction using dictionaries
   nfp = {'a' : fp['a'], 'b' : fp['b'] - sp['b']}
   return full(nfp)

def subi(sp, fp): # subfunction using indices
   fv, ff = jax.flatten_util.ravel_pytree(fp)
   fv = fv.at[1].set(fv[1]-sp['b'])
   return full(ff(fv))

fp = { 'a' : 3., 'b' : 2. }
sp = {'b' : 1.}
print(jax.make_jaxpr(jax.grad(full))(fp))
print(jax.make_jaxpr(jax.grad(subd))(sp,fp))
print(jax.make_jaxpr(jax.grad(subi))(sp,fp))

If we inspect the output, we can see that the gradient computation of subd actually skips some computations, great!
However, when I use the index formulation, the full gradient is computed in between scatter/gather ops. Is there a way to achieve the same goal (i.e., eliminate useless computations in gradients of subfunctions) using indices?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficiently differentiate through lax.scatter ops? #13051

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Efficiently differentiate through lax.scatter ops? #13051

Uh oh!

stefanozampini Oct 31, 2022

Replies: 0 comments

stefanozampini
Oct 31, 2022