ARROW-17933: [C++] SparseCOOTensor raises error when created with zero elements#14378
ARROW-17933: [C++] SparseCOOTensor raises error when created with zero elements#14378rok wants to merge 3 commits intoapache:mainfrom
Conversation
pitrou
left a comment
There was a problem hiding this comment.
Thanks for the fix.
- Could you add tests on the C++ side?
- Could you also add similar tests (if not already existing) for other types of sparse tensors?
There was a problem hiding this comment.
Does this actually create a tensor with zero elements?
There was a problem hiding this comment.
Yeah, coo_matrix automatically ignores zeros, I'll add an assertion checking for number of elements to make this explicit.
There was a problem hiding this comment.
Well actually it's:
scipy_matrix = coo_matrix(([0], ([0], [0])), shape=(2, 4))
assert scipy_matrix.nnz == 1
scipy_matrix = coo_matrix([[0, 0], [0, 0]])
assert scipy_matrix.nnz == 0
There was a problem hiding this comment.
Right, but the shape is all non-zero.
There was a problem hiding this comment.
Well those are actually different constructors (shape seems to be ignored in this case):
1.
coo_matrix((data, (i, j)), [shape=(M, N)])
to construct from three arrays:
data[:] the entries of the matrix, in any order
i[:] the row indices of the matrix entries
j[:] the column indices of the matrix entries
scipy_matrix = coo_matrix(([0], ([0], [0])))
assert scipy_matrix.nnz == 1
2.
coo_matrix(D)
with a dense matrix D
scipy_matrix = coo_matrix([[0, 0], [0, 0]])
assert scipy_matrix.nnz == 0
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html
There was a problem hiding this comment.
But this doesn't address the issue, does it?
There was a problem hiding this comment.
The reported issue was due to scipy.coo_matrix returning a sparse matrix with a dimension of zero size for an all zeros dense tensor. This is not an issue if the sparse matrix is created from components. Propose change tests from dense creation path in Python with scipy.coo_matrix, scipy.csr_matrix and sparse.COO. It also tests C++ SparseTensor creation from a dense tensor with one zero-sized dimension for SparseCOOTensor, SparseCSRMatrix, SparseCSRMatrix and SparseCSFTensor.
There was a problem hiding this comment.
I might be missing the point though :D
There was a problem hiding this comment.
I believe @rok has the correct test here but I could be wrong. The original issue is about a sparse matrix with a valid non-zero shape but no elements:
>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32)
<2x4 sparse matrix of type '<class 'numpy.float32'>'
with 0 stored elements in COOrdinate format>
>>> numpy.zeros((2,4))
array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> numpy.zeros((2,4)).shape
(2, 4)
|
@pitrou I added c++ tests for a case where we create a sparse tensor from a dense one filled with zeros. |
|
|
||
| TYPED_TEST_SUITE_P(TestSparseTensorFromDense); | ||
|
|
||
| TYPED_TEST_P(TestSparseTensorFromDense, TestNonZeroLength) { |
There was a problem hiding this comment.
Ok, but is it the situation that the PR is meant to be fixing?
AFAIU, the problem was not when there were no non-zero values, but when the logical tensor was empty (one of the dimensions in the shape being zero).
There was a problem hiding this comment.
Indeed. I changed the c++ test so that one dimension is now 0.
|
@pitrou is there something left to address here? |
|
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
|
This is waiting for review. |
|
We are actively using the sparse tensor classes and would greatly appreciate the fix. Thanks! |
|
@rok who are you waiting review from? Only @AlenkaF ? @pitrou is probably unable to review this one on a short term, but maybe @westonpace can take a look |
|
I was waiting for @pitrou. I suppose @AlenkaF, @westonpace or @jorisvandenbossche would be good candidates. |
| sparse_tensor.to_tensor().to_numpy()) | ||
|
|
||
| sparse_array = sparse.COO.from_numpy([[0, 0], [0, 0]]) | ||
| sparse_tensor = pa.SparseCOOTensor.from_pydata_sparse(sparse_array, |
There was a problem hiding this comment.
Is there a reason why from_scipy() and to_scipy() is not used here?
|
|
||
| RETURN_NOT_OK(internal::CheckSparseIndexMaximumValue(type, shape)); | ||
|
|
||
| // Indexes with no values are considered valid |
There was a problem hiding this comment.
I don't understand what is happening here (I don't understand much about the tensors so not surprising). Why is it ok if one of the shape elements is zero? I would expect an empty sparse matrix to still have a shape:
>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32).shape
(2, 4)
There was a problem hiding this comment.
I believe @rok has the correct test here but I could be wrong. The original issue is about a sparse matrix with a valid non-zero shape but no elements:
>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32)
<2x4 sparse matrix of type '<class 'numpy.float32'>'
with 0 stored elements in COOrdinate format>
>>> numpy.zeros((2,4))
array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> numpy.zeros((2,4)).shape
(2, 4)
| scipy_matrix = coo_matrix([[0, 0], [0, 0]]) | ||
| sparse_tensor = pa.SparseCOOTensor.from_scipy(scipy_matrix, | ||
| dim_names=dim_names) | ||
| out_scipy_matrix = sparse_tensor.to_scipy() |
There was a problem hiding this comment.
Can we verify the shape of sparse_tensor here? It should be 2 x 2 correct?
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
This is to resolve ARROW-17933.