Skip to content

ARROW-17933: [C++] SparseCOOTensor raises error when created with zero elements#14378

Open
rok wants to merge 3 commits intoapache:mainfrom
rok:ARROW-17933
Open

ARROW-17933: [C++] SparseCOOTensor raises error when created with zero elements#14378
rok wants to merge 3 commits intoapache:mainfrom
rok:ARROW-17933

Conversation

@rok
Copy link
Member

@rok rok commented Oct 11, 2022

This is to resolve ARROW-17933.

@github-actions
Copy link

@rok rok marked this pull request as ready for review October 11, 2022 23:19
@pitrou pitrou changed the title ARROW-17933: SparseCOOTensor raises error when created with zero elements ARROW-17933: [C++] SparseCOOTensor raises error when created with zero elements Oct 12, 2022
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

  1. Could you add tests on the C++ side?
  2. Could you also add similar tests (if not already existing) for other types of sparse tensors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually create a tensor with zero elements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, coo_matrix automatically ignores zeros, I'll add an assertion checking for number of elements to make this explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually it's:

    scipy_matrix = coo_matrix(([0], ([0], [0])), shape=(2, 4))
    assert scipy_matrix.nnz == 1
    scipy_matrix = coo_matrix([[0, 0], [0, 0]])
    assert scipy_matrix.nnz == 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the shape is all non-zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well those are actually different constructors (shape seems to be ignored in this case):

1.
coo_matrix((data, (i, j)), [shape=(M, N)])
to construct from three arrays:
data[:] the entries of the matrix, in any order
i[:] the row indices of the matrix entries
j[:] the column indices of the matrix entries

    scipy_matrix = coo_matrix(([0], ([0], [0])))
    assert scipy_matrix.nnz == 1

2.
coo_matrix(D)
with a dense matrix D

    scipy_matrix = coo_matrix([[0, 0], [0, 0]])
    assert scipy_matrix.nnz == 0

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this doesn't address the issue, does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported issue was due to scipy.coo_matrix returning a sparse matrix with a dimension of zero size for an all zeros dense tensor. This is not an issue if the sparse matrix is created from components. Propose change tests from dense creation path in Python with scipy.coo_matrix, scipy.csr_matrix and sparse.COO. It also tests C++ SparseTensor creation from a dense tensor with one zero-sized dimension for SparseCOOTensor, SparseCSRMatrix, SparseCSRMatrix and SparseCSFTensor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing the point though :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou ping

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @rok has the correct test here but I could be wrong. The original issue is about a sparse matrix with a valid non-zero shape but no elements:

>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32)
<2x4 sparse matrix of type '<class 'numpy.float32'>'
	with 0 stored elements in COOrdinate format>
>>> numpy.zeros((2,4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> numpy.zeros((2,4)).shape
(2, 4)

@rok
Copy link
Member Author

rok commented Oct 13, 2022

@pitrou I added c++ tests for a case where we create a sparse tensor from a dense one filled with zeros.


TYPED_TEST_SUITE_P(TestSparseTensorFromDense);

TYPED_TEST_P(TestSparseTensorFromDense, TestNonZeroLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but is it the situation that the PR is meant to be fixing?
AFAIU, the problem was not when there were no non-zero values, but when the logical tensor was empty (one of the dimensions in the shape being zero).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I changed the c++ test so that one dimension is now 0.

@rok
Copy link
Member Author

rok commented Jan 7, 2023

@pitrou is there something left to address here?

@rok rok requested a review from pitrou February 10, 2023 20:56
@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
@rok rok reopened this Mar 30, 2023
@rok rok requested a review from AlenkaF as a code owner March 30, 2023 17:49
@rok
Copy link
Member Author

rok commented Mar 30, 2023

This is waiting for review.

@bkmartinjr
Copy link

We are actively using the sparse tensor classes and would greatly appreciate the fix. Thanks!

@amol-
Copy link
Member

amol- commented Apr 13, 2023

@rok who are you waiting review from? Only @AlenkaF ? @pitrou is probably unable to review this one on a short term, but maybe @westonpace can take a look

@rok
Copy link
Member Author

rok commented Apr 13, 2023

I was waiting for @pitrou. I suppose @AlenkaF, @westonpace or @jorisvandenbossche would be good candidates.

sparse_tensor.to_tensor().to_numpy())

sparse_array = sparse.COO.from_numpy([[0, 0], [0, 0]])
sparse_tensor = pa.SparseCOOTensor.from_pydata_sparse(sparse_array,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why from_scipy() and to_scipy() is not used here?

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Apr 13, 2023

RETURN_NOT_OK(internal::CheckSparseIndexMaximumValue(type, shape));

// Indexes with no values are considered valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is happening here (I don't understand much about the tensors so not surprising). Why is it ok if one of the shape elements is zero? I would expect an empty sparse matrix to still have a shape:

>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32).shape
(2, 4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @rok has the correct test here but I could be wrong. The original issue is about a sparse matrix with a valid non-zero shape but no elements:

>>> scipy.sparse.coo_matrix(numpy.zeros((2,4)), dtype=numpy.float32)
<2x4 sparse matrix of type '<class 'numpy.float32'>'
	with 0 stored elements in COOrdinate format>
>>> numpy.zeros((2,4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> numpy.zeros((2,4)).shape
(2, 4)

scipy_matrix = coo_matrix([[0, 0], [0, 0]])
sparse_tensor = pa.SparseCOOTensor.from_scipy(scipy_matrix,
dim_names=dim_names)
out_scipy_matrix = sparse_tensor.to_scipy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify the shape of sparse_tensor here? It should be 2 x 2 correct?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 20, 2023
@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer.

@github-actions github-actions bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025
@github-actions github-actions bot closed this Dec 9, 2025
@rok rok reopened this Dec 9, 2025
@rok rok requested a review from raulcd as a code owner December 9, 2025 12:07
@github-actions github-actions bot removed the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments