-
Notifications
You must be signed in to change notification settings - Fork 78
957 sparse data transfer #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
957 sparse data transfer #972
Conversation
2977a80 to
52383f3
Compare
pdamme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution, @LuloDuarte! This PR enables the efficient exchange of sparse matrices/tensors with widely used Python libraries like SciPy (CSR, CSC, COO), TensorFlow (COO), and PyTorch (CSR, CSC, COO). For CSR, the data transfer is zero-copy. For CSC and COO, a layout transformation is required as DAPHNE currently only supports CSR. Anyway, not having to resort to a dense representation for the transfer is a big advantage for all three layouts. Thus, this pull request contributes to connecting DAPHNE to the Python data science ecosystem, which is a welcome contribution.
Overall, your code looks very good to me and the tests pass on my local machine (but I haven’t tried the TensorFlow/PyTorch-related tests yet). Changes were applied throughout most layers of DAPHNE, from the Python API (DaphneLib) and the DaphneDSL parser, over the IR and compiler, to the data structures and kernels. Very thorough work, well done!
I have a few remaining questions on the code and some changes are required before we can merge this pull request:
Questions: (please clarify)
- You updated
CSRMatrix::shrinkNumNonZeros()to allocate new arrays that are just large enough and copy over the data. This may release some memory, but can be costly due to the copying. At the same time, this functions doesn’t seem to be used anywhere in the code base. Is there a specific reason why you changed it? - You added a specialization of the
saveDaphneLibResult-kernel forCSRMatrix. Besides other things, this kernel removes potential explicitly stored zero values from the matrix. To this end, it counts the number of non-zeros, allocates new arrays, and copies the non-zero elements. All of these steps can take quite some time (and there seems to be a memory leak as the original arrays of the matrix are never freed) and makes the data transfer non-zero-copy. At the same time, at least SciPy does support explicitly stored zero elements. So I would expect that a few stored zeros (if any) won’t hurt much, while the copying can be quite costly. Is there any specific reason why you decided to do it that way?
Required changes: (need to be addressed before merging, I will take care of it)
- Make the CI checks pass. Currently it fails because SciPy is not installed in the CI container. In general, SciPy should be an optional dependency of DAPHNE, just like TensorFlow and PyTorch. That is, if SciPy is installed, the related DAPHNE features can be used; otherwise, they can’t but the rest of DAPHNE should still work. You already made the imports of SciPy optional in DaphneLib. However, what’s still missing is making the SciPy-related test cases conditional upon whether SciPy is installed (just like we do it for TensorFlow and PyTorch).
test.shandtest/api/python/DaphneLibTest.cppmust be updated accordingly. (In the future, we could add SciPy to the DAPHNE containers, but that’s not required for merging this PR.) - Mention the new features in the DaphneLib documentation, such that users become aware of them.
Optional changes: (could be addressed now or in a follow-up pull request, I will do the minimum solution now)
- Ensure that DaphneLib can gracefully handle Python sparse matrices/tensors that don’t have a canonical format. SciPy defines a canonical format for CSR, CSC, and COO matrices. For instance, a CSR matrix has a canonical format if “Within each row, indices are sorted by column.“ and „There are no duplicate entries.“ SciPy does not require the canonical format. In contrast, DAPHNE assumes such a canonical format for CSR matrices. On the one hand, we need additional test cases that transfer non-canonical-format sparse matrices from Python to DAPHNE, including some processing of the data at the side of DAPHNE, which expects a canonical format. On the other hand, we need to ensure that any non-canonical-format sparse matrix from Python ends up as a canonical CSR matrix in DAPHNE, which may require additional transformations. As an alternative, we could just check if the input matrix from Python has a canonical format and raise an error, otherwise.
As it requires only minor adjustments, I will address the required changes myself. Furthermore, I will make a pass over the code and polish a few little, non-critical things. Regarding the proposed optional change, I will go for the approach of raising an error in case of non-canonical format. With that, we achieve fail-fast behavior and support for non-canonical format inputs can be addressed in a follow-up issue/project.
Thus, there is no more implementation work required on your side, @LuloDuarte. However, it would be great if you could clarify the two questions above.
By the way, I’ve rebased your branch and resolved some little conflicts. In case you still want to add anything, please please don’t force-push anymore, just add commits.
Added round-trip support for CSR, COO and CSC sparse matrices over shared memory.