Way around BCOO batch dimension contraction in example? #8454

DanPuzzuoli · 2021-11-03T22:26:02Z

DanPuzzuoli
Nov 3, 2021

Hi,

I'm working on writing some code using the BCOO type and am running into an issue I'm not sure how to get around. Mathematically, what I'm trying to do is write a function that computes the following:

Input: matrix X (2d dense array)
Output: sum_i (A_i @ X @ B_i)

For A = [A_1, ..., A_k] and B = [B_1, ..., B_k] fixed lists of matrices that are part of the function definition. Each A_i and B_i is sparse, so I want to represent A and B as 3d BCOO arrays. If everything is dense, I'd just do:

def f(X):
    return (A @ X @ B).sum(0)

When trying to use A and B as BCOO matrices, I've set sparse_matmul = sparsify(jnp.matmul) and written:

def f(X):
    return sparse_matmul(A, sparse_matmul(X, B)).sum(0)

This however doesn't work unless I set n_batch=1 when constructing both A and B. (To be honest I don't really understand the implications of specifying something as a batch dimension, but I just found that it worked.) So long as I set n_batch=1, the above function appears to work and be jittable, however, if I try to take the gradient of:

def f(X):
    return sparse_matmul(A, sparse_matmul(X, B)).sum()

(summing all elements as opposed to only over the 0 axis to get a scalar output) I can't take the grad of this function. With the specific error being:

NotImplementedError: bcoo_dot_general batch dimensions must be among the batch dimensions in the sparse representtaion.
got lhs_batch=(0,), n_batch=0

Is there a different way I can write this so that I can also grad through this computation?

Answered by jakevdp

Nov 4, 2021

Thanks for the question. FIrst of all, in trying to answer your question I realized there is a bug in how A @ B is implemented for sparse matrices. You should be able to just write A @ X @ B and have it dispatch to the proper operation; I'm planning to fix the bug in #8455

That said, it works correctly right now within sparsify, so you can do this:

import jax.numpy as jnp
from jax import grad
from jax.experimental import sparse

A = jnp.arange(24.).reshape(2, 3, 4)
X = jnp.arange(20.).reshape(4, 5)
B = jnp.arange(60.).reshape(2, 5, 6)

@sparse.sparsify
def f(A, X, B):
  return A @ X @ B

Asp = sparse.BCOO.fromdense(A, n_batch=1)
Bsp = sparse.BCOO.fromdense(B, n_batch=1)

print(f(A, X, B).s…

View full answer

jakevdp · 2021-11-04T00:59:11Z

jakevdp
Nov 4, 2021
Maintainer

Thanks for the question. FIrst of all, in trying to answer your question I realized there is a bug in how A @ B is implemented for sparse matrices. You should be able to just write A @ X @ B and have it dispatch to the proper operation; I'm planning to fix the bug in #8455

That said, it works correctly right now within sparsify, so you can do this:

import jax.numpy as jnp
from jax import grad
from jax.experimental import sparse

A = jnp.arange(24.).reshape(2, 3, 4)
X = jnp.arange(20.).reshape(4, 5)
B = jnp.arange(60.).reshape(2, 5, 6)

@sparse.sparsify
def f(A, X, B):
  return A @ X @ B

Asp = sparse.BCOO.fromdense(A, n_batch=1)
Bsp = sparse.BCOO.fromdense(B, n_batch=1)

print(f(A, X, B).shape)
# (2, 3, 6)
print(f(Asp, X, Bsp).shape)
# (2, 3, 6)

As for the n_batch thing: this is essentially what lets BCOO matrices interact with JAX's vmap transform. Unfortunately it's not well documented yet, but suffice to say if you want to represent a list of COO arrays, a BCOO array with n_batch=1 is appropriate; additionally, currently for batched sparse matmuls are only implemented for cases where the batch dimension in the matmul matches the batch dimension in the sparse matrix.

Unfortunately, when you run something like

from jax import grad
gsp = jax.grad(lambda X: f(Asp, X, B).sum())
print(gsp(X).shape)

the reverse-mode autodiff requires doing a sparse matmul where the contraction dimension is over the array's batch dimension, which is not yet implemented – there's no deep reason why this shouldn't be possible, it's just not something we've written the code for yet (it's on the list).

Fortunately, forward-mode autodiff does not require such a sparse-sparse matrix product, so if you can reexpress your gradient computation in terms of forward-mode autodiff, then you will be able to compute the result. For example:

from jax import grad, jacfwd
g = grad(lambda X: f(A, X, B).sum())
print(g(X))
# [[ 9540. 11700. 13860. 16020. 18180.]
#  [10170. 12546. 14922. 17298. 19674.]
#  [10800. 13392. 15984. 18576. 21168.]
#  [11430. 14238. 17046. 19854. 22662.]]

gsp = jacfwd(lambda X: f(Asp, X, B).sum())
print(gsp(X))
# [[ 9540. 11700. 13860. 16020. 18180.]
#  [10170. 12546. 14922. 17298. 19674.]
#  [10800. 13392. 15984. 18576. 21168.]
#  [11430. 14238. 17046. 19854. 22662.]]

(Note jacfwd is only equivalent to grad in the case of a function with a single input that returns a scalar).

Hopefully we will be able to get those unimplemented matmul modes implemented very soon, so you won't need this workaround any more.

2 replies

DanPuzzuoli Nov 4, 2021
Author

Ahh good to know that forward mode will work - I always just default to reverse mode and tend to forget that the requirements are a bit different.

As usual thank you so much. Looking forward to more sparse support!

jakevdp Nov 11, 2021
Maintainer

FYI, #8491 implements contraction along batch dimensions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Way around BCOO batch dimension contraction in example? #8454

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Way around BCOO batch dimension contraction in example? #8454

Uh oh!

DanPuzzuoli Nov 3, 2021

Replies: 1 comment · 2 replies

Uh oh!

jakevdp Nov 4, 2021 Maintainer

Uh oh!

DanPuzzuoli Nov 4, 2021 Author

Uh oh!

jakevdp Nov 11, 2021 Maintainer

DanPuzzuoli
Nov 3, 2021

Replies: 1 comment 2 replies

jakevdp
Nov 4, 2021
Maintainer

DanPuzzuoli Nov 4, 2021
Author

jakevdp Nov 11, 2021
Maintainer