[CUSPARSE] Implement dense * coo matmul without prior sorting #3005
+54
−89
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation of matrix-matrix multiplication
mm!of a dense and a sparse matrix (with a coo-matrix as the second argument) required a user to sort the sparse matrix before multiplication. This was tested behavior, see generic.jl or interfaces.jl.This is because CUSPARSE only provides an API for
C = a*A*B + b*C(where A is sparse), the case where B is sparse is implemented by transposing the identity toCt = a*Bt*At + b*Ct. For csc/csr we can exchange the type to realize the transpose, but for coo we would have to resort the col- and row-indices. Sorting currently copies the whole matrix.The requirement to sort before multiplication is somewhat unexpected, especially in the higher level interfaces
mul!and*which should produce the correct result without prior sorting (or should check for correct ordering and warn/error). Currently the matrix multiplication returns normally but with a wrong result (see #2820).However, in almost all cases we can realize the multiplication of dense * coo without sorting by flipping the 'N'/'T'/'C' argument of the sparse matrix correspondingly. See this PR.
The only case that cannot be easily realized is if the
eltype(B)of the matrix is<:Complexand the transb argument== 'C'. The lazy transpose would need a "only conjugate" (no transpose) flag. This PR implements this case by materializing the conjugate of the coo-matrix entries. (Compared to sorting this already reduces memory consumption, because only a copy of the entries is required instead of a full copy).An alternative implementation (fully inplace) could: 1. conjugate the values of
C(if!iszero(b)), 2. compute the matrix product and 3. conjugate the values ofCagain. Without a only conjugate flag provided by CUSPARSE, I don't see a way to realize this case without additional work or implementing our own matmul (which would probably be slow).Both of these options are suboptimal for a matmul implementation, we could also error for this specific case asking the user to supply the matrix in csc/csr format. Any opinions on that?
Closes #2820.
Also this removes
mm_wrapperthat would only duplicate the size checks inmm!and shortcut ifisempty(B)but return a zero matrix of wrong shape.