Evaluating the full Jacobian of a function, when its custom derivative rule is a jax primitive, involves batching? #19973

Hs293Go · 2024-02-26T07:18:36Z

Hs293Go
Feb 26, 2024

Edit so that the bottom line is up front: The question is:

How to make jax.jacobian and jax.jac(fwd|rev) work (not just jax.grad) when derivative rule is a primitive
What is the role of batching in all this

We have extensive docs on defining custom derivative rules and a helpful discussion #12730 (comment) on defining a derivative rules that is a primitive itself.

All of these resources test their derivative rules with jax.grad, instead of jax.jacobian. Luckily, jax.jacobian just works when applied to the @custom_vjp decorated pure-python functions in the custom derivative rules doc.

The same could not be said for the code in (#12730 (comment)). Replacing its last line print(jax.grad(foo)(3.0)) with print(jax.jacobian(foo)(3.0)) and running will throw

  File "/home/hs293go/python-scripts/extending-jax/wtf2.py", line 50, in foo_jvp_transpose
    x_bar = foo_vjp_p.bind(x, y_bar)  # y_bar aka y_grad
            ^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Batching rule for 'foo_vjp' not implemented

This is quite astonishing, since batching rules and derivative rules seem to be orthogonal to my best knowledge; They are never mentioned together in the same context by the docs. Whereas the primary source introducing batching rules How JAX primitives work demonstrates batching a function itself, not its jvp/vjp rule.

On a side note, in the extending-jax project, applying jax.jacobian to kepler_jax.kepler just works, since the function itself is a C++ primitive but the derivative rule is pure python.

I'm including a brief demo of what is working and what's not based on the code in #12730 (comment) that ~~adopts the newer custom_(jvp|vjp) syntax and~~ implements a inherently vector-valued function

EDIT 2: Per #19973, remove all uses of custom_jvp but instead define primitive_jvps

import jax
import jax.numpy as jnp
import numdifftools
import numpy as onp
from jax import core
from jax.interpreters import ad

# == SETUP PROBLEM: ADAPTED FROM 'Question about defining new JAX primitives #12730' ===

# Make a Primitive
lorenz_p = core.Primitive("lorenz")

# EDIT 2: Do NOT use @jax.custom_jvp
def lorenz(x):
    return lorenz_p.bind(x)

# hardcode lorenz system parameters for simplicity
SIGMA, RHO, BETA = 28.0, 10.0, 8 / 3

@lorenz_p.def_impl
def lorenz_impl(x):
    """ Lorenz system; Dynamical system is inherently vector-valued, 
    instead of vector-valued through vectorization
    """
    return onp.array(
        [SIGMA * (x[1] - x[0]), x[0] * (RHO - x[2]) - x[1], x[0] * x[1] - BETA * x[2]]
    )


# EDIT 2: Do NOT use @lorenz.defjvp
def lorenz_jvp(primals, tangents):
    (x,), (xdot,) = primals, tangents
    y = lorenz(x)
    y_dot = lorenz_jvp_p.bind(x, xdot)
    return y, y_dot

lorenz_jvp_p = core.Primitive("lorenz_jvp")

#EDIT 2: Do use
ad.primitive_jvps[lorenz_p] = lorenz_jvp

@lorenz_jvp_p.def_impl
def lorenz_jvp_impl(x, x_dot):
    return onp.array(
        [
            x_dot[1] * SIGMA - x_dot[0] * SIGMA,
            -x[2] * x_dot[0] - x[0] * x_dot[2] - x_dot[1] + x_dot[0] * RHO,
            x[1] * x_dot[0] + x[0] * x_dot[1] - x_dot[2] * BETA,
        ]
    )

@lorenz_jvp_p.def_abstract_eval
def lorenz_jvp_abstract_eval(_, x_dot_aval):
    y_dot_aval = core.ShapedArray(x_dot_aval.shape, x_dot_aval.dtype)
    return y_dot_aval

def lorenz_jvp_transpose(y_bar, x, x_dot_dummy):
    assert ad.is_undefined_primal(x_dot_dummy)  # just a dummy input
    x_bar = lorenz_vjp_p.bind(x, y_bar)  # y_bar aka y_grad
    return None, x_bar  # None for nonlinear primal input x

ad.primitive_transposes[lorenz_jvp_p] = lorenz_jvp_transpose

# Finally, let's write the vjp rule as a primitive.
lorenz_vjp_p = core.Primitive("lorenz_vjp")

@lorenz_vjp_p.def_impl
def lorenz_vjp_impl(x, v):
    return onp.array(
        [
            v[2] * x[1] + v[1] * (RHO - x[2]) - v[0] * SIGMA,
            v[2] * x[0] - v[1] + v[0] * SIGMA,
            -v[2] * BETA - v[1] * x[0],
        ]
    )

test_primal = jnp.array([1.0, 0.0, 0.0])
test_tangent = jnp.array([0.0, 1.0, 0.01])

# ============================= BASIC BEHAVIOR IS CORRECT ==============================
_, result = jax.jvp(lorenz, [test_primal], [test_tangent])  # Step 1: JVP
expected = numdifftools.Jacobian(lorenz)(test_primal) @ test_tangent
assert jnp.allclose(result, expected)

_, f_vjp = jax.vjp(lorenz, test_primal)  # Step 2: VJP
(result,) = f_vjp(test_tangent)
expected = test_tangent @ numdifftools.Jacobian(lorenz)(test_primal)
assert jnp.allclose(result, expected)

def scalar_val_test_fn(x):
    return jnp.hypot(*lorenz(x + onp.array([1.0, 2.0, 0.0]))[1::-1])

# Step 3: Autodiff a function that nests 'lorenz'
result = jax.grad(scalar_val_test_fn)(test_primal)
expected = numdifftools.Gradient(scalar_val_test_fn)(test_primal)
assert jnp.allclose(result, expected, rtol=1e-4)  # type: ignore

# ========================= CAN TAKE NEITHER GRAD NOR JACOBIAN =========================
try:
    _ = jax.grad(lorenz)(test_primal)  # Cannot take grad
except TypeError as e:
    print(e)  # Gradient only defined for scalar-output functions.

try:
    _ = jax.jacfwd(lorenz)(test_primal)  # Cannot take jacobian
except NotImplementedError as e:
    print(e)  # Batching rule for 'lorenz_jvp' not implemented

try:
    _ = jax.jacrev(lorenz)(test_primal)  # Cannot take jacobian
except NotImplementedError as e:
    print(e)  # Batching rule for 'lorenz_vjp' not implemented

Answered by jakevdp

Feb 27, 2024

The batching rule for a primitive takes a tuple of arguments, and a tuple of batch dims, and evaluates the batched version of the primitive. Since your primitive is implemented via normal JAX operations, you can implement the batching rule via a call to vmap. For example:

lorenz_jvp_p = core.Primitive("lorenz_jvp")

@lorenz_jvp_p.def_impl
def lorenz_jvp_impl(x, x_dot):
    return jnp.array(
        [
            x_dot[1] * SIGMA - x_dot[0] * SIGMA,
            -x[2] * x_dot[0] - x[0] * x_dot[2] - x_dot[1] + x_dot[0] * RHO,
            x[1] * x_dot[0] + x[0] * x_dot[1] - x_dot[2] * BETA,
        ]
    )

def lorenz_jvp_batching_rule(batched_args, batch_dims):
  x, x_dot = batched_args
  ba…

View full answer

jakevdp · 2024-02-26T23:19:40Z

jakevdp
Feb 26, 2024
Maintainer

Hi - as a brief answer to your question:

How to make jax.jacobian and jax.jac(fwd|rev) work (not just jax.grad) when derivative rule is a primitive

You'll need to define the batching rule for this jvp rule in order to use jacfwd or jacrev

What is the role of batching in all this

jacfwd is essentially vmap of jvp, and vmap requires the batching rule to be defined.

Stepping back, though, you'll need to start over on your implementation. You're confusing two different concepts: primitive jvp rules and custom jvp rules. The custom_jvp framework is designed for cases when you are not defining a new primtive. If you are defining a new primitive, then you should not use custom_jvp, rather you should register jvp and transpose rules for your primitive directly.

Here it's not clear what you're gaining from defining a custom primitive. A good way forward might be to define your functions as normal Python functions, wrapped in custom_jvp, in which case you'll get batching rules automatically. What do you think?

5 replies

Hs293Go Feb 27, 2024
Author

Thank you so much for the clarification. I'm only interested in primitive jvp rules, so hopefully this narrows down the problem. I've amended the code example, getting rid of custom_jvp and using ad.primitive_(jvps|transposes) following #12730 (comment) more faithfully.

At this stage the code still doesn't produce the Jacobians, so I suppose my problem boils down to: How to define the batching rule for jvp rules. Are there any idioms or patterns? Does batching a jvp, a primitive rule at that, complicate things relative to batching a plain function?

For context, I am interested in primitive jvp rules because I plan to eventually turn some existing C++ implementations of dynamical system models AND their derivatives into jax primitives. My pure-python code example is a mock of the boilerplate to setup primitives and their derivative/batching rules; I'm afraid turning to custom_jvp would not help me.

Thanks for taking the time to answer!

jakevdp Feb 27, 2024
Maintainer

The batching rule for a primitive takes a tuple of arguments, and a tuple of batch dims, and evaluates the batched version of the primitive. Since your primitive is implemented via normal JAX operations, you can implement the batching rule via a call to vmap. For example:

lorenz_jvp_p = core.Primitive("lorenz_jvp")

@lorenz_jvp_p.def_impl
def lorenz_jvp_impl(x, x_dot):
    return jnp.array(
        [
            x_dot[1] * SIGMA - x_dot[0] * SIGMA,
            -x[2] * x_dot[0] - x[0] * x_dot[2] - x_dot[1] + x_dot[0] * RHO,
            x[1] * x_dot[0] + x[0] * x_dot[1] - x_dot[2] * BETA,
        ]
    )

def lorenz_jvp_batching_rule(batched_args, batch_dims):
  x, x_dot = batched_args
  batch_dim_out = 0
  result = jax.vmap(lorenz_jvp_impl, in_axes=batch_dims, out_axes=batch_dim_out)(x, x_dot)
  return result, batch_dim_out

from jax.interpreters import batching
batching.primitive_batchers[lorenz_jvp_p] = lorenz_jvp_batching_rule

(note I had to change the definition of lorenz_jvp_impl because it had been returning a numpy array rather than a JAX array, which is not compatible with vmap)

In the real case, when your primitive is calling out to some C++ function, you won't be able to use vmap. The best case is that your C++ routine natively supports batched computation, in which case you can implement the batching rule by calling the primitive directly. If your C++ routine doesn't support batched input, then you're somewhat stuck: you'd have to mimic batching by some kind of looping, which may not be efficient.

Hope that helps!

Answer selected by Hs293Go

Hs293Go Feb 28, 2024
Author

Thank you so much! That solved it. Showing that I have to vmap the jvp rule, such that during Jacobian evaluation the
primal is perfectly forwarded but the tangent is varied over the standard bases, is the key.

This is not immediately obvious to us end users, when all examples of batching in the How Jax primitives work doc just forward the batching args to the original, inherently vectorized function.

In the real case, my C++-based primitive launches a CUDA kernel to evaluate the function for each input in a batch. Essentially, I do mimic batching, but in C++ instead of python.

The end goal of all this is to squeeze out some extra performance by letting both a function and its derivative rule be primitives that call down to tightly optimized C++. To my best knowledge this hasn't been done before, so I couldn't measure the performance gain from using a C++ primitive derivative rule --- until now.

jakevdp Feb 28, 2024
Maintainer

Glad it helped. Regarding docs: custom primitives are not really meant as a public API, so we haven’t put effort into fully documenting them.

Hs293Go Feb 28, 2024
Author

In that case, I hope when jax.extend lands the API will be similar to what we have right now. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluating the full Jacobian of a function, when its custom derivative rule is a jax primitive, involves batching? #19973

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Evaluating the full Jacobian of a function, when its custom derivative rule is a jax primitive, involves batching? #19973

Uh oh!

Uh oh!

Hs293Go Feb 26, 2024

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

jakevdp Feb 26, 2024 Maintainer

Uh oh!

Hs293Go Feb 27, 2024 Author

Uh oh!

jakevdp Feb 27, 2024 Maintainer

Uh oh!

Hs293Go Feb 28, 2024 Author

Uh oh!

jakevdp Feb 28, 2024 Maintainer

Uh oh!

Hs293Go Feb 28, 2024 Author

Hs293Go
Feb 26, 2024

Replies: 1 comment 5 replies

jakevdp
Feb 26, 2024
Maintainer

Hs293Go Feb 27, 2024
Author

jakevdp Feb 27, 2024
Maintainer

Hs293Go Feb 28, 2024
Author

jakevdp Feb 28, 2024
Maintainer

Hs293Go Feb 28, 2024
Author