-
Notifications
You must be signed in to change notification settings - Fork 99
feat: add reduce kernels #3136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: add reduce kernels #3136
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
860868d
feat: add tree reduction implementation of argmin and argmax
ManasviGoyal 3cdbd7e
feat: add awkward_ListOffsetArray_reduce_local_outoffsets_64 kernel
ManasviGoyal c1a846b
test: integration tests for cuda
ManasviGoyal 7be3f98
test: some more integration tests for cuda
ManasviGoyal 98fb7ed
feat: add awkward_reduce_count_64 kernel
ManasviGoyal 0ed94ef
fix: indexing and indentation
ManasviGoyal 02c03bc
feat: add awkward_reduce_countnonzero kernel
ManasviGoyal 34fc82b
feat: add reduce sum, min and max kernels
ManasviGoyal 4e00f07
feat: add reduce prod and sum_int_bool
ManasviGoyal b28a605
feat: add sum_bool and prod_bool kernels
ManasviGoyal 9e7abc7
fix: use cpt.assert_allclose
ManasviGoyal 458165c
test: reducer integration tests
ManasviGoyal c75cb79
fix: typr conversion
ManasviGoyal 427670c
fix: use atomic to avoid race conditions
ManasviGoyal 127e035
fix: remove unnessary variable
ManasviGoyal 8dee2ae
fix: minor fixes
ManasviGoyal b957bee
Merge branch 'main' into ManasviGoyal/add-reducer-kernels
ManasviGoyal 896770f
fix: all reducer for atomics
ManasviGoyal f3d1cdc
fix: missing template
ManasviGoyal ef47ead
fix: remove complex
ManasviGoyal c881f1d
fix: atomicMin() for float 32 and indentation
ManasviGoyal 38d30b9
fix: pass correct dtype of identity
ManasviGoyal 51b0e15
fix: remove combinations test
ManasviGoyal 7e7fdc4
fix: manage resources and disable failing test
ianna 1148b95
fix: uncomment fixed test for slicing
ManasviGoyal 8e926ab
fix: correctly interpret typetracer array for cuda backend
ManasviGoyal 38d314d
fix: tests-spec error for bool
ManasviGoyal 15068b6
fix: check for the backend of head
ManasviGoyal d864481
Merge branch 'main' into ManasviGoyal/add-reducer-kernels
ianna b2c0a89
Merge branch 'main' into ManasviGoyal/add-reducer-kernels
ianna 8921b82
Update dev/generate-tests.py
ianna c9bff0f
Merge branch 'main' into ManasviGoyal/add-reducer-kernels
jpivarski File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
100 changes: 100 additions & 0 deletions
100
src/awkward/_connect/cuda/cuda_kernels/awkward_ListOffsetArray_reduce_local_outoffsets_64.cu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| // BSD 3-Clause License; see https://github.com/scikit-hep/awkward-1.0/blob/main/LICENSE | ||
|
|
||
| // BEGIN PYTHON | ||
| // def f(grid, block, args): | ||
| // (outoffsets, parents, lenparents, outlength, invocation_index, err_code) = args | ||
| // if block[0] > 0: | ||
| // segment = math.floor((outlength + block[0] - 1) / block[0]) | ||
| // grid_size = math.floor((lenparents + block[0] - 1) / block[0]) | ||
| // else: | ||
| // grid_size = 1 | ||
| // temp = cupy.zeros(lenparents, dtype=cupy.int64) | ||
| // scan_in_array = cupy.zeros(outlength, dtype=cupy.uint64) | ||
| // cuda_kernel_templates.get_function(fetch_specialization(["awkward_ListOffsetArray_reduce_local_outoffsets_64_a", cupy.dtype(outoffsets.dtype).type, parents.dtype]))((grid_size,), block, (outoffsets, parents, lenparents, outlength, scan_in_array, temp, invocation_index, err_code)) | ||
| // cuda_kernel_templates.get_function(fetch_specialization(["awkward_ListOffsetArray_reduce_local_outoffsets_64_b", cupy.dtype(outoffsets.dtype).type, parents.dtype]))((grid_size,), block, (outoffsets, parents, lenparents, outlength, scan_in_array, temp, invocation_index, err_code)) | ||
| // scan_in_array = cupy.cumsum(scan_in_array) | ||
| // cuda_kernel_templates.get_function(fetch_specialization(["awkward_ListOffsetArray_reduce_local_outoffsets_64_c", cupy.dtype(outoffsets.dtype).type, parents.dtype]))((grid_size,), block, (outoffsets, parents, lenparents, outlength, scan_in_array, temp, invocation_index, err_code)) | ||
| // out["awkward_ListOffsetArray_reduce_local_outoffsets_64_a", {dtype_specializations}] = None | ||
| // out["awkward_ListOffsetArray_reduce_local_outoffsets_64_b", {dtype_specializations}] = None | ||
| // out["awkward_ListOffsetArray_reduce_local_outoffsets_64_c", {dtype_specializations}] = None | ||
| // END PYTHON | ||
|
|
||
| template <typename T, typename C> | ||
| __global__ void | ||
| awkward_ListOffsetArray_reduce_local_outoffsets_64_a( | ||
| T* outoffsets, | ||
| const C* parents, | ||
| int64_t lenparents, | ||
| int64_t outlength, | ||
| uint64_t* scan_in_array, | ||
| int64_t* temp, | ||
| uint64_t invocation_index, | ||
| uint64_t* err_code) { | ||
| if (err_code[0] == NO_ERROR) { | ||
| int64_t thread_id = blockIdx.x * blockDim.x + threadIdx.x; | ||
|
|
||
| if (thread_id < outlength) { | ||
| outoffsets[thread_id] = 0; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| template <typename T, typename C> | ||
| __global__ void | ||
| awkward_ListOffsetArray_reduce_local_outoffsets_64_b( | ||
| T* outoffsets, | ||
| const C* parents, | ||
| int64_t lenparents, | ||
| int64_t outlength, | ||
| uint64_t* scan_in_array, | ||
| int64_t* temp, | ||
| uint64_t invocation_index, | ||
| uint64_t* err_code) { | ||
| if (err_code[0] == NO_ERROR) { | ||
| int64_t idx = threadIdx.x; | ||
| int64_t thread_id = blockIdx.x * blockDim.x + idx; | ||
|
|
||
| if (thread_id < lenparents) { | ||
| temp[thread_id] = 1; | ||
| } | ||
| __syncthreads(); | ||
|
|
||
| for (int64_t stride = 1; stride < blockDim.x; stride *= 2) { | ||
| int64_t val = 0; | ||
| if (idx >= stride && thread_id < lenparents && parents[thread_id] == parents[thread_id - stride]) { | ||
| val = temp[thread_id - stride]; | ||
| } | ||
| __syncthreads(); | ||
| temp[thread_id] += val; | ||
| __syncthreads(); | ||
| } | ||
|
|
||
| if (thread_id < lenparents) { | ||
| int64_t parent = parents[thread_id]; | ||
| if (idx == blockDim.x - 1 || thread_id == lenparents - 1 || parents[thread_id] != parents[thread_id + 1]) { | ||
| atomicAdd(&scan_in_array[parent], temp[thread_id]); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| template <typename T, typename C> | ||
| __global__ void | ||
| awkward_ListOffsetArray_reduce_local_outoffsets_64_c( | ||
| T* outoffsets, | ||
| const C* parents, | ||
| int64_t lenparents, | ||
| int64_t outlength, | ||
| uint64_t* scan_in_array, | ||
| int64_t* temp, | ||
| uint64_t invocation_index, | ||
| uint64_t* err_code) { | ||
| if (err_code[0] == NO_ERROR) { | ||
| int64_t thread_id = blockIdx.x * blockDim.x + threadIdx.x; | ||
| outoffsets[0] = 0; | ||
|
|
||
| if (thread_id < outlength) { | ||
| outoffsets[thread_id + 1] = (T)(scan_in_array[thread_id]); | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.