Efficient computation of the Jacobian determinant #13044

nalzok · 2022-10-29T16:21:20Z

nalzok
Oct 29, 2022

I wonder if there is an efficient method to compute the Jacobian determinant for a function $f: \mathbb{R}^d \to \mathbb{R}^d$. While it is possible to obtain the Jacobian matrix with jacfwd and then calculate the Jacobian with jax.numpy.linalg.det, that has a time complexity of $O(d^3)$. Can we do better?

Answered by mattjj

Oct 30, 2022

Thanks for the question!

My guess is it's not possible in general to get better asymptotic efficiency. Consider the case where f is just a general linear transformation, i.e. f = lambda x: jnp.dot(A, x) for given dense/unstructured A. Computing the Jacobian determinant of this function is exactly computing the determinant of A, and we need to pay d^3 for that.

Intuitively, evaluating the determinant of the dense Jacobian is doing the right thing for the general case: we want to know the volume of a parallelepiped which is the image of an axis-aligned unit cube under the locally-linearized function. Finding the image of each standard basis vector is exactly what jacfwd does, as efficiently…

View full answer

mattjj · 2022-10-30T03:51:45Z

mattjj
Oct 30, 2022
Maintainer

Thanks for the question!

My guess is it's not possible in general to get better asymptotic efficiency. Consider the case where f is just a general linear transformation, i.e. f = lambda x: jnp.dot(A, x) for given dense/unstructured A. Computing the Jacobian determinant of this function is exactly computing the determinant of A, and we need to pay d^3 for that.

Intuitively, evaluating the determinant of the dense Jacobian is doing the right thing for the general case: we want to know the volume of a parallelepiped which is the image of an axis-aligned unit cube under the locally-linearized function. Finding the image of each standard basis vector is exactly what jacfwd does, as efficiently as possible for general functions.

But clearly we're leaving something on the table for structured functions. For example, what if our function applied elementwise, like f = jnp.sin? Then we should only pay O(d) to compute its Jacobian determinant! And generating the full Jacobian would've been a waste (let alone performing dense operations on it). So perhaps we'd like to automatically take advantage of structure...

One way to think about taking advantage of structure is to break the function down into a composition of primitive functions. So long as we have a formula for computing the Jacobian determinant of f \circ g in terms of computing the Jacobian determinants of f and g and then combining them (and we do: just multiply!), then we can take advantage of any structure in each primitive function (by having a rule for each one, e.g. for sin we'd take advantage of it having a diagonal Jacobian) and combine the results together (exploiting any structure in e.g. data-dependence sparsity). That's how autodiff itself works too: we have a rule for each primitive and a rule for function composition (together with a way to automatically build a representation of f as a composition of primitives, which for JAX means tracing machinery).

So you could imagine writing a custom jaxpr interpreter along with a table of Jacobian determinant rules to do this. That interpreter would be able to exploit structure in the function f to get better asymptotic efficiency than the det-of-jacfwd approach in some cases where it's possible.

Actually, it turns out that @sharadmv wrote that custom jaxpr interpreter tutorial as he was learning about jaxprs, and he was particularly interested in the function inverse case because he cared about something along these lines. He was working on probabilistic programming, and automatically computing change-of-volume quantities is helpful in computing reparameterized densities for MCMC methods. For that reason he was interested in inverse-log-det-Jacobians.

Ultimately he went on to build Oryx, and Oryx likely has tools to compute Jacobian determinants in the structure-exploiting compositional way described here. If you're looking for a library to do these kinds of computations, check it out!

WDYT?

2 replies

mattjj Oct 30, 2022
Maintainer

Another interesting thing: we can estimate the Jacobian determinant more efficiently using Hutchison-style trace estimators. That is:

We can estimate tr(A) using just matrix-vector products against A by sampling a random v with E[v] = 0 and E[vv'] = I (e.g. with iid normal entries, or even better with iid equally-probable ±1 entries) and then computing v'(Av), which has expectation E[v'Av] = E[tr(v'Av)] = E[tr(Avv')] = tr(A E[vv']) = tr(A I) = tr(A).
We can similarly estimate tr(p(A)) for any polynomial p, just by expanding out the polynomial and applying the above trace estimation trick to each term.
That means for any function f which we can approximate as a polynomial (e.g. using Chebyshev polynomials to approximate it over some interval which we know contains the spectrum of A, or truncated Taylor series, or whatever) we can also estimate tr(f(A)).
Notice that det(A) = exp(tr(log(A))), so we can use this trick to estimate determinants of A by using suitable polynomial approximations to log.
Since jax.jvp gives us Jacobian-vector products, we can apply this to estimate Jacobian determinants!

I think the first 4 points are covered in this paper: http://proceedings.mlr.press/v37/hana15.pdf. (I know Dmitry from grad school!) The last point is just putting autodiff into the mix.

nalzok Oct 30, 2022
Author

Hi Matt, thanks for the insightful reply and approximation trick! For some context, I asked about the Jacobian determinant since I am learning flow-based models. One observation is that you don't need to calculate the inverse while doing Variational Inference with Normalizing Flows (it does not even need to be invertible as long as you can calculate the transformed density), so I was thinking if we can use arbitrary transformations and let autograd take care of the computation of the Jacobian determinant.

You are right that evaluating the Jacobian determinant for an arbitrary function is at least as hard as finding the determinant for an arbitrary matrix, so I guess we need to design (composable) transformations whose Jacobian has a nice structure. On the other hand, the Hutchison-style trace estimators you pointed me to look very interesting. I will check them out!

berthyf96 · 2024-04-10T00:08:17Z

berthyf96
Apr 10, 2024

Hi @nalzok, I'm facing a similar issue (running OOM when trying to do jax.jacfwd or jax.jacrev because the input and output matrices are too large). Would you be willing to share your solution?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient computation of the Jacobian determinant #13044

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Efficient computation of the Jacobian determinant #13044

Uh oh!

nalzok Oct 29, 2022

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

mattjj Oct 30, 2022 Maintainer

Uh oh!

Uh oh!

mattjj Oct 30, 2022 Maintainer

Uh oh!

Uh oh!

nalzok Oct 30, 2022 Author

Uh oh!

berthyf96 Apr 10, 2024

nalzok
Oct 29, 2022

Replies: 2 comments 2 replies

mattjj
Oct 30, 2022
Maintainer

mattjj Oct 30, 2022
Maintainer

nalzok Oct 30, 2022
Author

berthyf96
Apr 10, 2024