Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| typename ::cuda::std::enable_if_t< | ||
| ::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>, | ||
| int> = 0> |
There was a problem hiding this comment.
No need to constrain NumItemsT
| typename ::cuda::std::enable_if_t< | |
| ::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>, | |
| int> = 0> | |
| ::cuda::std::enable_if_t< | |
| !::cuda::std::is_same_v<InputIteratorT, void*>, | |
| int> = 0> |
| typename ::cuda::std::enable_if_t< | ||
| ::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>, | ||
| int> = 0> |
| #include <cub/device/dispatch/dispatch_select_if.cuh> | ||
| #include <cub/device/dispatch/dispatch_unique_by_key.cuh> | ||
|
|
||
| #include <cuda/__execution/determinism.h> |
There was a problem hiding this comment.
slipped in while on a workaround. let it pass it's a byfix
🥳 CI Workflow Results🟩 Finished in 1h 21m: Pass: 100%/249 | Total: 3d 01h | Max: 1h 21m | Hits: 98%/156109See results here. |
| _CCCL_NVTX_RANGE_SCOPE("cub::DeviceRunLengthEncode::NonTrivialRuns"); | ||
|
|
||
| using global_offset_t = detail::choose_signed_offset_t<NumItemsT>; | ||
| using equality_op = ::cuda::std::equal_to<>; |
There was a problem hiding this comment.
I believe this is fine, because it always just returns a bool and does not promote integers
| _CCCL_NVTX_RANGE_SCOPE("cub::DeviceRunLengthEncode::Encode"); | ||
|
|
||
| using equality_op = ::cuda::std::equal_to<>; | ||
| using reduction_op = ::cuda::std::plus<>; |
There was a problem hiding this comment.
Ditto: Should this rather be
| using reduction_op = ::cuda::std::plus<>; | |
| using reduction_op = ::cuda::std::plus<length_t>; |
Otherwise this will always promote offset_t to a larger integer
There was a problem hiding this comment.
The non-env overload where I guess this implementation comes from also uses plus<>. I agree that that's maybe not what we want, since plus<length_t> does not promote and influence the accumulator type. But this should be addressed in a separate PR.
| cudaStream_t custom_stream; | ||
| REQUIRE(cudaSuccess == cudaStreamCreate(&custom_stream)); |
There was a problem hiding this comment.
Suggestion: Please use cuda::stream so it gets higher test coverage. If you want to pass a cudaStream_tyou can always call .get() (I think) on the cuda::stream to get the raw underlying stream.
This suggestion applies generally for all env-overload PRs.
There was a problem hiding this comment.
@gonidelis since the PR auto-merged, please create a note or a tracking issue to replace all manual stream creation by cuda::stream in our unit tests.
fixes #7547