Indexing a sparse matrix #30883

cnguyen10 · 2025-08-11T08:58:49Z

cnguyen10
Aug 11, 2025

I am working on relabeling a large dataset with a large number of classes. Naively using a dense matrix would lead to memory inefficiency or out of memory, although the matrix storing the class labels is sparse. Hence, I am using jax.experimental.sparse module to make it memory efficient. To update the matrix, at each iteration:

take a mini-batch of the rows of the sparse matrix
update new values and their indices
store them in two arrays
At the end, the two arrays (data and indices) are used to update the whole matrix.

However, I am facing a problem when indexing or slicing the sparse matrix. When returning a sparse submatrix, that submatrix counts the zero elements as well:

import jax.numpy as jnp
from jax.experimental import sparse

M = jnp.array([[0., 1., 0., 2.], [3., 0., 0., 0.], [0., 0., 4., 0.]])

M_sp = sparse.BCOO.fromdense(M)
print(M_sp)  # print result: BCOO(float32[3, 4], nse=4)

m = M_sp[0]
print(m)  # print result: BCOO(float32[4], nse=4)
print(m.data)  # print result: Array([1., 2., 0., 0.], dtype=float32)
print(m.indices)  # print result: Array([[1], [3], [4], [4]], dtype=int32)

I expect that when indexing, it should return a sparse sub-matrix, whose data is non-zero elements and their corresponding indices in that new sub-matrix. The current behavior is strange.

A work around solution is that whenever making a sparse sub-matrix, I have to prune all of the zero values in m.data and remove their corresponding indices in m.indices. I wonder if this could be done automatically right away when we get that sub-matrix instead of manually adding additional steps.

Answered by jakevdp

Aug 11, 2025

This behavior comes from the static shape requirements of jax.jit and other transformations. For a general sparse array, it is impossible to know at compile time how many elements are nonzero in, say, the first row. So when you index the first row, the code returns a padded representation that will always be able to contain the row's contents.

If you want to remove these explicit zeros, you can do so via the sum_duplicates method:

m = m.sum_duplicates(remove_zeros=True)
print(m.data)
print(m.indices)

[1. 2.]
[[1]
 [3]]

Note however that because the output of this is dynamically-shaped (the shape of the data and index arrays depend on the array contents) you won't be able to do this opera…

View full answer

jakevdp · 2025-08-11T12:21:01Z

jakevdp
Aug 11, 2025
Maintainer

This behavior comes from the static shape requirements of jax.jit and other transformations. For a general sparse array, it is impossible to know at compile time how many elements are nonzero in, say, the first row. So when you index the first row, the code returns a padded representation that will always be able to contain the row's contents.

If you want to remove these explicit zeros, you can do so via the sum_duplicates method:

m = m.sum_duplicates(remove_zeros=True)
print(m.data)
print(m.indices)

[1. 2.]
[[1]
 [3]]

Note however that because the output of this is dynamically-shaped (the shape of the data and index arrays depend on the array contents) you won't be able to do this operation under jit or other JAX transformations.

Alternatively, you can use a structured sparse layout (in this case a BCOO with one batch dimension) and then this sort of indexing will be more constrained, because we know a priori that there are a maximum of two nonzero elements per row:

M_sp = sparse.BCOO.fromdense(M, n_batch=1)
print(M_sp)

m = M_sp[0]
print(m.data)
print(m.indices)

BCOO(float32[3, 4], nse=2, n_batch=1)
[1. 2.]
[[1]
 [3]]

I hope that helps!

1 reply

cnguyen10 Aug 11, 2025
Author

Since the number of non-zero elements in each row is known, your suggestion with the batch dimension being 1 seems to be a good solution for me.
Thank you very much for the prompt reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indexing a sparse matrix #30883

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Indexing a sparse matrix #30883

Uh oh!

cnguyen10 Aug 11, 2025

Replies: 1 comment · 1 reply

Uh oh!

jakevdp Aug 11, 2025 Maintainer

Uh oh!

Uh oh!

cnguyen10 Aug 11, 2025 Author

cnguyen10
Aug 11, 2025

Replies: 1 comment 1 reply

jakevdp
Aug 11, 2025
Maintainer

cnguyen10 Aug 11, 2025
Author