-
Notifications
You must be signed in to change notification settings - Fork 99
feat: Add ak.sort() for CUDA backend
#3750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
ak.sort() for CUDA backendak.sort() for CUDA backend
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3750 |
ianna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwina - this is great! Thanks for implementing it. Let’s keep it open and discuss.
| assert virtual_numpyarray.is_all_materialized | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="awkward_sort is not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwina - we should probably keep the virtual arrays out of POC. Our design goal is to support delayed (lazy) evaluation of operations on virtual arrays.
This ensures that array transformations are represented symbolically until explicitly
materialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with keeping virtual arrays out of this for now, I think adding this later is an easy step as we can just explicitly "maybe_materialize" when calling CCCL kernels.
Just to clarify: Virtual arrays are not for delaying kernels/operations. It's only to lazify the IO when using ak.from_buffers to create an ak.Array. Once the buffer has been delivered by the IO source, everything is eager (there's no symbolic representation of the program or any compute graph here).
It's not the goal of virtual arrays to create a symbolic representation of the program. That doesn't exist in awkward so far. It was what we've done over in https://github.com/dask-contrib/dask-awkward at the level of "highlevel Arrays". In principle it's interesting to have this though, but then it should likely be at the level of buffers and kernels that act on those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the ci errors btw for virtual arrays, the tests seem to be hitting this cupy/cupy#9089 which has been fixed but is to be out in cupy 14.
I see
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think if it wasn't for this problem, they would probably pass too. So yeah...they can't be enabled until cupy 14 is out at least.
| else: | ||
| raise AssertionError(f"CuPyKernel not found: {index!r}") | ||
|
|
||
| # CuPy kernel not found, try cuda.compute as fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should prioritize CCCL kernels for GPU operations, with CuPy serving as a fallback when CCCL is unavailable?
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| cuda-version: | ||
| - 11 | ||
| - 12 | ||
| - 13 | ||
|
|
||
| steps: | ||
| - name: Clean the workspace and mamba | ||
| run: | | ||
| rm -rf * .[!.]* || echo "Nothing to clean" | ||
| rm -rf ~/micromamba* || echo "Nothing to clean" | ||
| - uses: actions/checkout@v6 | ||
| with: | ||
| submodules: true | ||
|
|
||
| - name: Get micromamba | ||
| uses: mamba-org/setup-micromamba@v2 | ||
| with: | ||
| environment-name: test-env | ||
| init-shell: bash | ||
| create-args: >- | ||
| -c rapidsai | ||
| -c nvidia | ||
| python=3.13 | ||
| cccl-python | ||
| cudf | ||
| cupy | ||
| cuda-version=${{ matrix.cuda-version }} | ||
| cuda-toolkit | ||
| cxx-compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| strategy: | |
| fail-fast: false | |
| matrix: | |
| cuda-version: [11, 12, 13] | |
| steps: | |
| - name: Clean the workspace and mamba | |
| run: | | |
| rm -rf * .[!.]* || echo "Nothing to clean" | |
| rm -rf ~/micromamba* || echo "Nothing to clean" | |
| - uses: actions/checkout@v6 | |
| with: | |
| submodules: true | |
| - name: Get micromamba (CUDA <= 11) | |
| if: ${{ matrix.cuda-version < 12 }} | |
| uses: mamba-org/setup-micromamba@v2 | |
| with: | |
| environment-name: test-env | |
| init-shell: bash | |
| create-args: >- | |
| -c rapidsai | |
| -c nvidia | |
| python=3.13 | |
| cudf | |
| cupy | |
| cuda-version=${{ matrix.cuda-version }} | |
| cuda-toolkit | |
| cxx-compiler | |
| - name: Get micromamba (CUDA >= 12, add cccl-python) | |
| if: ${{ matrix.cuda-version >= 12 }} | |
| uses: mamba-org/setup-micromamba@v2 | |
| with: | |
| environment-name: test-env | |
| init-shell: bash | |
| create-args: >- | |
| -c rapidsai | |
| -c nvidia | |
| python=3.13 | |
| cccl-python | |
| cudf | |
| cupy | |
| cuda-version=${{ matrix.cuda-version }} | |
| cuda-toolkit | |
| cxx-compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hopefully correct fixed matrix that installs cccl-python only when CUDA ≥ 12, and avoids the solver error when CUDA = 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ariostas - given that we still want to use CUDA 11.
Closes #3749.
Following the study #3748, this PR adds support for
ak.sort()for the CUDA backend, using https://nvidia.github.io/cccl/python/compute.html.