Add env overloads for DeviceSegmentedRadixSort#7999
Add env overloads for DeviceSegmentedRadixSort#7999gonidelis wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
2c8d80a to
55ca027
Compare
This comment has been minimized.
This comment has been minimized.
NaderAlAwar
left a comment
There was a problem hiding this comment.
Important: we are missing the DoubleBuffer overloads
| offsets.begin() + 1, | ||
| 0, | ||
| sizeof(int) * 8, | ||
| env); |
There was a problem hiding this comment.
Question: do we want to add a test where we do not explicitly pass an env? Other env API tests seem to have both kinds, explicit and implicit env objects.
There was a problem hiding this comment.
i think it's fine. unit tests cover that case and users can see it throughout docs as soon as they start using our env overloads. don't wanna over bloat our test base for
There was a problem hiding this comment.
We have that test in the other file added in this PR.
@NaderAlAwar will be added in subsequent pr for economy of reviewing effort per pr |
| offsets.begin() + 1, | ||
| 0, | ||
| sizeof(int) * 8, | ||
| env); |
There was a problem hiding this comment.
We have that test in the other file added in this PR.
🥳 CI Workflow Results🟩 Finished in 1h 10m: Pass: 100%/249 | Total: 3d 19h | Max: 55m 34s | Hits: 94%/157462See results here. |
Only because our radix-sort is stable by design. Unstable sort could be non-deterministic 😉 |
| //! | ||
| //! Snippet | ||
| //! | ||
| //! .. literalinclude:: ../../../cub/test/catch2_test_device_segmented_radix_sort_env_api.cu |
There was a problem hiding this comment.
| //! .. literalinclude:: ../../../cub/test/catch2_test_device_segmented_radix_sort_env_api.cu | |
| //! .. literalinclude:: ../../test/catch2_test_device_segmented_radix_sort_env_api.cu |
| //! This is an environment-based API that allows customization of: | ||
| //! | ||
| //! - Stream: Query via ``cuda::get_stream`` | ||
| //! - Memory resource: Query via ``cuda::mr::get_memory_resource`` |
There was a problem hiding this comment.
Not sure I like this better than the - Can use a specific stream or cuda memory resource through the ``env`` parameter I saw in the previous PR. I'm also not sure this list wont be rendered/read as part as the following list.
| //! yield a corresponding performance improvement. | ||
| //! - Note, the size of any segment may not exceed ``INT_MAX``. Please consider using ``DeviceSegmentedSort`` instead, | ||
| //! if the size of at least one of your segments could exceed ``INT_MAX``. | ||
| //! |
There was a problem hiding this comment.
Maybe we should have a @env like the @devicestorage on the old APIs to avoid the repetition. Even if it needs multiple like @env-with-guarantees.
There was a problem hiding this comment.
I considered suggesting the same! No need to do it in this PR though.
| //! - An optional bit subrange ``[begin_bit, end_bit)`` of differentiating key | ||
| //! bits can be specified. This can reduce overall sorting overhead and | ||
| //! yield a corresponding performance improvement. | ||
| //! - Note, the size of any segment may not exceed ``INT_MAX``. Please consider using ``DeviceSegmentedSort`` instead, |
There was a problem hiding this comment.
Missing
//! - Let ``in`` be one of ``{d_keys_in, d_values_in}`` and `out` be any of
//! ``{d_keys_out, d_values_out}``. The range ``[out, out + num_items)`` shall
//! not overlap ``[in, in + num_items)``,
//! ``[d_begin_offsets, d_begin_offsets + num_segments)`` nor
//! ``[d_end_offsets, d_end_offsets + num_segments)`` in any way.
and
//! - Segments are not required to be contiguous. For all index values ``i``
//! outside the specified segments ``d_keys_in[i]``, ``d_values_in[i]``,
//! ``d_keys_out[i]``, ``d_values_out[i]`` will not be accessed nor modified.
Maybe also
//! - @devicestorageNP For sorting using only ``O(P)`` temporary storage, see
//! the sorting interface using DoubleBuffer wrappers below.
| } | ||
|
|
||
| //! @rst | ||
| //! Overview |
There was a problem hiding this comment.
This "title" is new in comparison to the existing docs. If it stays, it looks like it should be underlined (rendered as a sub-title)
| //! Overview | |
| //! Overview | |
| //! +++++++++++++++++++++++++++++++++++++++++++++ |
Same for Snippet below (which is not new but has this underline in existing docs)
| typename BeginOffsetIteratorT, | ||
| typename EndOffsetIteratorT, | ||
| typename EnvT = ::cuda::std::execution::env<>> | ||
| [[nodiscard]] CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE static cudaError_t SortKeys( |
There was a problem hiding this comment.
Is _CCCL_FORCEINLINE necessary here? Existing overload does not have it.
There was a problem hiding this comment.
Not necessary, please remove!
| //! - An optional bit subrange ``[begin_bit, end_bit)`` of differentiating key | ||
| //! bits can be specified. This can reduce overall sorting overhead and | ||
| //! yield a corresponding performance improvement. | ||
| //! - Note, the size of any segment may not exceed ``INT_MAX``. Please consider using ``DeviceSegmentedSort`` instead, |
| thrust::raw_pointer_cast(keys_out.data()), | ||
| thrust::raw_pointer_cast(values_in.data()), | ||
| thrust::raw_pointer_cast(values_out.data()), | ||
| static_cast<int>(keys_in.size()), |
There was a problem hiding this comment.
| static_cast<int>(keys_in.size()), | |
| static_cast<cuda::std::int64_t>(keys_in.size()), |
| REQUIRE(keys_out == expected_keys); | ||
| } | ||
|
|
||
| TEST_CASE("DeviceSegmentedRadixSort::SortPairs uses custom stream", "[segmented_radix_sort][device]") |
There was a problem hiding this comment.
Should this be tested for all overloads?
| thrust::raw_pointer_cast(keys_out.data()), | ||
| thrust::raw_pointer_cast(values_in.data()), | ||
| thrust::raw_pointer_cast(values_out.data()), | ||
| static_cast<int>(keys_in.size()), |
There was a problem hiding this comment.
| static_cast<int>(keys_in.size()), | |
| static_cast<cuda::std::int64_t>(keys_in.size()), |
There was a problem hiding this comment.
Do we need a cast at all here? To suppress the unsigned -> signed warning, right?
bernhardmgruber
left a comment
There was a problem hiding this comment.
Please apply the remaining feedback from other reviewers. Otherwise LGTM
fixes #7549
No deterministic guarantees. It's sorting, if it's not deterministic it's just wrong.