Sparse extraction - memory usage issue #16291

adambomandel · 2023-06-07T09:23:27Z

adambomandel
Jun 7, 2023

I am having a bit of trouble with a jax.experimental.sparse.BCOO, matrix when finding the gradient of a function with a sparse solver. I am currently using klusolve from KLUJAX as it allows for a direct sparse solver. The matrix system I need to solve for is a slightly smaller segment, K_free, of the original matrix K, with the dense indices free I am currently using the following inherited method:

K = jsparse.BCOO((sK, ijK), shape=(ndof, ndof))
K_free = K[free,:][:,free]

It works, but when tracking memory usage, the K_free operation is a absolute killer, it nearly 10x's memory usage for a small system, rendering the algorithm useless. My best guess is that the way I have written it, it asks jax to convert to dense, extract the specified values, and then return to sparse. I have checked that K_free is indeed returned as a jax.BCOO.

Is there another better way of extracting the values I need in a more efficient manner?

The function I am calculating the gradient of, with respect to x :

def obj_compliance(x,nelx,nely,free,penal,Emax,Emin,f,H,Hs,ijK): 
  ndof = 2*(nelx+1)*(nely+1)
  u=jnp.zeros((ndof))
  xPhys = H.T @ (x.T/Hs)
  KE=lk()
  sK=((KE.flatten()[np.newaxis]).T*(Emin+(xPhys)**penal*(Emax-Emin))).flatten(order='F')
  K = jsparse.BCOO((sK, ijK), shape=(ndof, ndof))
  K_free = K[free,:][:,free]
  u_solve = klusolve(K_free.indices[:,0],K_free.indices[:,1],K_free.data,f[free, 0])
  compliance = f.T @ u.at[free].add(u_solve)
  return compliance.sum()

(compliance, dc) = value_and_grad(obj_compliance, argnums=0)(x,nelx,nely,free,penal,Emax,Emin,f,H,Hs,ijK)

Thanks!
-Adam

Answered by jakevdp

Jun 8, 2023

The issue is that "extracting values from sparse arrays" is fundamentally a set-join operation, and XLA has no primitives for set arithmetic. As a result, BCOO has to construct the operation using available primitives. For lack of a better option, it essentially does jnp,any(query[:, None] = indices[None, :], -1), which has very poor memory scaling as the size of the query and indices increase.

I don't really have any better suggestions (if I did, I would have put them in the BCOO implementation!) but you may be able to do better for your specific application by writing the low-level index manipulations directly. For example, it looks like you're only concerned with extracting values alon…

View full answer

jakevdp · 2023-06-08T09:04:14Z

jakevdp
Jun 8, 2023
Maintainer

The issue is that "extracting values from sparse arrays" is fundamentally a set-join operation, and XLA has no primitives for set arithmetic. As a result, BCOO has to construct the operation using available primitives. For lack of a better option, it essentially does jnp,any(query[:, None] = indices[None, :], -1), which has very poor memory scaling as the size of the query and indices increase.

I don't really have any better suggestions (if I did, I would have put them in the BCOO implementation!) but you may be able to do better for your specific application by writing the low-level index manipulations directly. For example, it looks like you're only concerned with extracting values along the diagonal, which you may be able to exploit to do so more efficiently.

For what it's worth, this is one of the (many) reasons why JAX sparse is still under jax.experimental, and hasn't graduated to a fully-supported API.

5 replies

adambomandel Jun 9, 2023
Author

Thanks Jake! Will fix it by low-level indexing directly.

adambomandel Jun 14, 2023
Author

Hello again Jake, I have revisited this issue and have found a good method of doing some 'low level indexing' with a few matrix operations, but am running into trouble with multiplying two sparse matrices in JAX. The method uses a sparse diagonal matrix 'null' with the same shape as $K$, where $K_{free}$ is found by:

$$K_{free} = null^T \, K \,null+(null-I) \$\$$$

This works great in scipy:

N = np.ones((ndof,1))
for i in range(len(fixed)):
    N[fixed[i]] = 0
null = np.diag(N[:,0])
null = coo_matrix(null)

K = coo_matrix((sK_scipy, (ijK[:,0], ijK[:,1])), shape=(ndof, ndof)).tocsc()
K_free = null.T*K_scipy*null + (null-eye(ndof,ndof))

But in JAX when implementing:

 null = np.diag(N[:,0]).astype(int)
 null = jsparse.BCOO.fromdense(null, index_dtype='int64')

  K = jsparse.BCOO((sK, ijK), shape=(ndof, ndof))
  K_free =bcoo_multiply_sparse(bcoo_multiply_sparse(null.T,K),null) + (null-jsparse.eye(ndof,dtype ='int64',index_dtype='int64')) #replaces:
  #K_free = K[free,:][:,free]

$K_{free}$ stays the right size, but generates a MASSIVE number of duplicates of the last index, which I have tried to sum with bcoo_sum_duplicates, which as you well know is only compatible with jacfwd, and seems to be very heavy.

I've tried splitting the operation up, and have found that the number of duplicates increases massively with each step:

N1 =(null-jsparse.eye(ndof,dtype ='int64',index_dtype='int64')))
K2 = bcoo_sum_duplicates(bcoo_multiply_sparse(null.T,K))
K3= bcoo_sum_duplicates(bcoo_multiply_sparse(K2,null))
K_free = K3 + N1

Would love to hear if I'm overseeing another method of sparse matrix * sparse matrix

jakevdp Jun 14, 2023
Maintainer

Yeah, this is expected unfortunately. The problem is that JAX cannot know the number of specified elements at compile time, so it must allocate enough for the worst case, which is lhs.nse * rhs.nse (imagine, e.g. that the left matrix is a column vector and the right matrix is a row vector).

In this case I think you can do better using structured sparsity: if you use n_batch=1 for the diagonal matrix, then this puts a tighter constraint on what the output nse needs to be. Hope that helps!

adambomandel Jun 14, 2023
Author

That makes sense!

Is there any chance you could show mean what you mean with using a structured sparsify?? not sure I follow.

jakevdp Jun 15, 2023
Maintainer

not structured sparsify, but structured sparsity. For example:

mat = sparse.BCOO.fromdense(mat_dense, n_batch=1)

This produces a sparse matrix with one batch dimension. If mat_dense is diagonal, then this has nse per batch of 1. Using that kind of structured format can lead to more efficient operations, because essentially the compiler knows that each row has one element, rather than having to account for the defined elements being anywhere in the array.

DoTulip · 2024-04-06T16:40:23Z

DoTulip
Apr 6, 2024

Hi, Adam. I have the same problem as you. Do you have an elegant solution? @adambomandel

2 replies

jakevdp Apr 6, 2024
Maintainer

No, I don't believe there is currently any elegant solution to efficiently expressing set-join semantics in XLA.

DoTulip Apr 7, 2024

Thank you very much for your reply!

Sparse extraction - memory usage issue #16291

Uh oh!

adambomandel Jun 7, 2023

Replies: 2 comments · 7 replies

Uh oh!

Uh oh!

jakevdp Jun 8, 2023 Maintainer

Uh oh!

adambomandel Jun 9, 2023 Author

Uh oh!

adambomandel Jun 14, 2023 Author

Uh oh!

jakevdp Jun 14, 2023 Maintainer

Uh oh!

adambomandel Jun 14, 2023 Author

Uh oh!

Uh oh!

jakevdp Jun 15, 2023 Maintainer

Uh oh!

DoTulip Apr 6, 2024

Uh oh!

Uh oh!

jakevdp Apr 6, 2024 Maintainer

Uh oh!

DoTulip Apr 7, 2024

adambomandel
Jun 7, 2023

Replies: 2 comments 7 replies

jakevdp
Jun 8, 2023
Maintainer

adambomandel Jun 9, 2023
Author

adambomandel Jun 14, 2023
Author

jakevdp Jun 14, 2023
Maintainer

adambomandel Jun 14, 2023
Author

jakevdp Jun 15, 2023
Maintainer

DoTulip
Apr 6, 2024

jakevdp Apr 6, 2024
Maintainer