Gradients with respect to one or more dictionary keys #17820

eguiraud · 2023-09-27T23:52:11Z

eguiraud
Sep 27, 2023

Hello,

JAX has the fantastic feature of transparently taking gradients through Python dicts.
This is useful e.g. to track variable by name during computations and in the gradient, which is less error-prone than tracking by argument number:

def f(inputs: dict[str, jax.Array]):
  x, y = inputs['x'], inputs['y']
  return x*x + jnp.sin(y)

features = dict(x=jnp.array(1.), y=jnp.array(2.))
jax.grad(f)(features)

returns a dictionary with gradient components named after the variables, brilliant:

# {'x': Array(2., dtype=float32, weak_type=True), 'y': Array(-0.41614684, dtype=float32, weak_type=True)}

Question

Given a function f(inputs: dict[str, jax.Array]) as above, is there a straightforward way to tell JAX to only take derivatives w.r.t. some specific keys of the inputs dictionary, e.g. only w.r.t. x in the example snippet above?

If not, how would you implement it?

Thanks in advance!

Answered by jakevdp

Sep 28, 2023

There is no easy built-in way to do this, but generalizing argnums to handle arbitrary pytrees is something that's been frequently discussed. See #3875, #10614, and references within.

I think the solutions suggested in the other answers here are probably the best option in the current version of JAX.

View full answer

eguiraud · 2023-09-28T00:33:26Z

eguiraud
Sep 28, 2023
Author

In the spirit of Cunningham's law, here's what I cobbled together to solve the above problem:

import jax
import jax.numpy as jnp

from typing import Callable


def grad_wrt_key(f: Callable, wrt: list[str]):
   """Given a callable f that takes a dictionary of jax arrays as input,
   return a callable that evaluates the gradient of f w.r.t. one or more of
   the arrays in the dictionary (specified by key).
   """
    def grad_wrt_impl(inputs: dict[str, jax.Array]):
        in_vars = list(inputs.keys())
        argnums = [in_vars.index(wrt_var) for wrt_var in wrt]

        def f_with_positionals(*args):
            args_as_dict = dict(zip(in_vars, args))
            return f(args_as_dict)

        grads = jax.grad(f_with_positionals, argnums)(*inputs.values())
        return dict(zip(wrt, grads))

    return grad_wrt_impl

usable as:

def f(inputs: dict[str, jax.Array]):
    x, y = inputs["x"], inputs["y"]
    return x * x + jnp.sin(y)

df = grad_wrt_key(f, wrt=["x"])

features = {'x': jnp.array(1.0), 'y': jnp.array(2.0)}

print(df(features))

{'x': Array(2., dtype=float32, weak_type=True)}

Besides the general clunkiness of grad_wrt_key, I don't particularly like that this calls jax.grad for every invocation of df. A proper implementation would need to cache the output of jax.grad, but I'm hoping that I am missing a more straightforward solution.

0 replies

davisyoshida · 2023-09-28T05:28:23Z

davisyoshida
Sep 28, 2023
Collaborator

I wrote a slightly more general version of this but I think it's on a hard drive that currently doesn't have a home. It does (almost) the same thing but for general pytrees.

import jax
import jax.numpy as jnp

def masked_grad(f, mask):
    flat_mask, mask_structure = jax.tree_util.tree_flatten(mask)

    def flat_f(diff_args, nondiff_args):
        diff_iter = iter(diff_args)
        nondiff_iter = iter(nondiff_args)
        combined_args = [
            next(diff_iter if m else nondiff_iter)
            for m in flat_mask
        ]

        unflattened_args = mask_structure.unflatten(combined_args)
        return f(*unflattened_args)

    flat_grad = jax.grad(flat_f)

    def grad_fn(*args):
        flat_args, arg_structure = jax.tree_util.tree_flatten(args)
        assert arg_structure == mask_structure
        diff_args = []
        nondiff_args = []
        arg_it = iter(flat_args)
        for m in flat_mask:
            if m:
                diff_args.append(next(arg_it))
            else:
                nondiff_args.append(next(arg_it))

        grads = iter(flat_grad(diff_args, nondiff_args))

        # What to return here probably depends on what you want to do with the grads
        # Using None won't play nice with a lot of stuff that uses pytrees
        # Could use float0, but optax doesn't like those
        placeholder = jnp.zeros(1)
        flat_grads_mixed = [next(grads) if m else placeholder for m in flat_mask]
        return arg_structure.unflatten(flat_grads_mixed)

    return grad_fn


def f(args):
    return args['x'] * args['y']

mask = ({'x': True, 'y': False},) # This has to be a tuple 
grad_fn = masked_grad(f, mask)
print(grad_fn({'x': 1., 'y': 2.}))

# Output:
# ({'x': Array(2., dtype=float32, weak_type=True), 'y': Array([0.], dtype=float32)},)

Drawbacks (fixable):

Doesn't do the thing you asked, since it returns something for every input, even if you don't want a gradient of them. The reason I did that is because optimizers generally expect you to pass gradients with the same pytree structure as the parameters.
You have to pass a mask which has the same structure as the arguments. That's super annoying if you have massive pytrees of parameters. My original version "broadcasts" a mask up to the full argument structure, provided that the mask structure is a prefix of the argument structure.
Always expects/outputs tuples.

1 reply

eguiraud Sep 28, 2023
Author

thank you @davisyoshida !

jakevdp · 2023-09-28T16:51:19Z

jakevdp
Sep 28, 2023
Maintainer

There is no easy built-in way to do this, but generalizing argnums to handle arbitrary pytrees is something that's been frequently discussed. See #3875, #10614, and references within.

I think the solutions suggested in the other answers here are probably the best option in the current version of JAX.

2 replies

eguiraud Sep 28, 2023
Author

I see, thanks!

yikuanli Dec 21, 2023

HI, I am facing the same problem. Do you find a good way to do this ? thanks

adam-hartshorne · 2023-12-21T13:56:08Z

adam-hartshorne
Dec 21, 2023

How about using equinox filter system?

https://docs.kidger.site/equinox/api/filtering/partition-combine/

0 replies

Gradients with respect to one or more dictionary keys #17820

Uh oh!

eguiraud Sep 27, 2023

Question

Replies: 4 comments · 3 replies

Uh oh!

Uh oh!

eguiraud Sep 28, 2023 Author

Uh oh!

Uh oh!

davisyoshida Sep 28, 2023 Collaborator

Uh oh!

eguiraud Sep 28, 2023 Author

Uh oh!

Uh oh!

jakevdp Sep 28, 2023 Maintainer

Uh oh!

eguiraud Sep 28, 2023 Author

Uh oh!

yikuanli Dec 21, 2023

Uh oh!

adam-hartshorne Dec 21, 2023

eguiraud
Sep 27, 2023

Replies: 4 comments 3 replies

eguiraud
Sep 28, 2023
Author

davisyoshida
Sep 28, 2023
Collaborator

eguiraud Sep 28, 2023
Author

jakevdp
Sep 28, 2023
Maintainer

eguiraud Sep 28, 2023
Author

adam-hartshorne
Dec 21, 2023