Skip to content

Conversation

@LuloDuarte
Copy link

Added round-trip support for CSR, COO and CSC sparse matrices over shared memory.

@pdamme pdamme self-requested a review July 21, 2025 16:30
@pdamme pdamme added student project Suitable for a bachelor/master student's programming project. AMLS summer 2025 Student project for the Architecture of ML Systems lecture at TU Berlin (summer 2025). labels Jul 21, 2025
@pdamme pdamme force-pushed the 957-sparse-data-transfer branch from 2977a80 to 52383f3 Compare July 22, 2025 16:01
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, @LuloDuarte! This PR enables the efficient exchange of sparse matrices/tensors with widely used Python libraries like SciPy (CSR, CSC, COO), TensorFlow (COO), and PyTorch (CSR, CSC, COO). For CSR, the data transfer is zero-copy. For CSC and COO, a layout transformation is required as DAPHNE currently only supports CSR. Anyway, not having to resort to a dense representation for the transfer is a big advantage for all three layouts. Thus, this pull request contributes to connecting DAPHNE to the Python data science ecosystem, which is a welcome contribution.

Overall, your code looks very good to me and the tests pass on my local machine (but I haven’t tried the TensorFlow/PyTorch-related tests yet). Changes were applied throughout most layers of DAPHNE, from the Python API (DaphneLib) and the DaphneDSL parser, over the IR and compiler, to the data structures and kernels. Very thorough work, well done!

I have a few remaining questions on the code and some changes are required before we can merge this pull request:

Questions: (please clarify)

  1. You updated CSRMatrix::shrinkNumNonZeros() to allocate new arrays that are just large enough and copy over the data. This may release some memory, but can be costly due to the copying. At the same time, this functions doesn’t seem to be used anywhere in the code base. Is there a specific reason why you changed it?
  2. You added a specialization of the saveDaphneLibResult-kernel for CSRMatrix. Besides other things, this kernel removes potential explicitly stored zero values from the matrix. To this end, it counts the number of non-zeros, allocates new arrays, and copies the non-zero elements. All of these steps can take quite some time (and there seems to be a memory leak as the original arrays of the matrix are never freed) and makes the data transfer non-zero-copy. At the same time, at least SciPy does support explicitly stored zero elements. So I would expect that a few stored zeros (if any) won’t hurt much, while the copying can be quite costly. Is there any specific reason why you decided to do it that way?

Required changes: (need to be addressed before merging, I will take care of it)

  1. Make the CI checks pass. Currently it fails because SciPy is not installed in the CI container. In general, SciPy should be an optional dependency of DAPHNE, just like TensorFlow and PyTorch. That is, if SciPy is installed, the related DAPHNE features can be used; otherwise, they can’t but the rest of DAPHNE should still work. You already made the imports of SciPy optional in DaphneLib. However, what’s still missing is making the SciPy-related test cases conditional upon whether SciPy is installed (just like we do it for TensorFlow and PyTorch). test.sh and test/api/python/DaphneLibTest.cpp must be updated accordingly. (In the future, we could add SciPy to the DAPHNE containers, but that’s not required for merging this PR.)
  2. Mention the new features in the DaphneLib documentation, such that users become aware of them.

Optional changes: (could be addressed now or in a follow-up pull request, I will do the minimum solution now)

  1. Ensure that DaphneLib can gracefully handle Python sparse matrices/tensors that don’t have a canonical format. SciPy defines a canonical format for CSR, CSC, and COO matrices. For instance, a CSR matrix has a canonical format if “Within each row, indices are sorted by column.“ and „There are no duplicate entries.“ SciPy does not require the canonical format. In contrast, DAPHNE assumes such a canonical format for CSR matrices. On the one hand, we need additional test cases that transfer non-canonical-format sparse matrices from Python to DAPHNE, including some processing of the data at the side of DAPHNE, which expects a canonical format. On the other hand, we need to ensure that any non-canonical-format sparse matrix from Python ends up as a canonical CSR matrix in DAPHNE, which may require additional transformations. As an alternative, we could just check if the input matrix from Python has a canonical format and raise an error, otherwise.

As it requires only minor adjustments, I will address the required changes myself. Furthermore, I will make a pass over the code and polish a few little, non-critical things. Regarding the proposed optional change, I will go for the approach of raising an error in case of non-canonical format. With that, we achieve fail-fast behavior and support for non-canonical format inputs can be addressed in a follow-up issue/project.

Thus, there is no more implementation work required on your side, @LuloDuarte. However, it would be great if you could clarify the two questions above.

By the way, I’ve rebased your branch and resolved some little conflicts. In case you still want to add anything, please please don’t force-push anymore, just add commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMLS summer 2025 Student project for the Architecture of ML Systems lecture at TU Berlin (summer 2025). student project Suitable for a bachelor/master student's programming project.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants