Skip to content

Add env RLE#7908

Merged
gonidelis merged 4 commits intoNVIDIA:mainfrom
gonidelis:rle_env
Mar 10, 2026
Merged

Add env RLE#7908
gonidelis merged 4 commits intoNVIDIA:mainfrom
gonidelis:rle_env

Conversation

@gonidelis
Copy link
Copy Markdown
Member

fixes #7547

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Mar 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Mar 5, 2026
Comment on lines +293 to +295
typename ::cuda::std::enable_if_t<
::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>,
int> = 0>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to constrain NumItemsT

Suggested change
typename ::cuda::std::enable_if_t<
::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>,
int> = 0>
::cuda::std::enable_if_t<
!::cuda::std::is_same_v<InputIteratorT, void*>,
int> = 0>

Comment on lines +552 to +554
typename ::cuda::std::enable_if_t<
::cuda::std::is_integral_v<NumItemsT> && !::cuda::std::is_same_v<InputIteratorT, void*>,
int> = 0>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@gonidelis gonidelis marked this pull request as ready for review March 9, 2026 23:57
@gonidelis gonidelis requested a review from a team as a code owner March 9, 2026 23:57
@gonidelis gonidelis requested a review from pauleonix March 9, 2026 23:57
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Mar 9, 2026
#include <cub/device/dispatch/dispatch_select_if.cuh>
#include <cub/device/dispatch/dispatch_unique_by_key.cuh>

#include <cuda/__execution/determinism.h>
Copy link
Copy Markdown
Member Author

@gonidelis gonidelis Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slipped in while on a workaround. let it pass it's a byfix

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 21m: Pass: 100%/249 | Total: 3d 01h | Max: 1h 21m | Hits: 98%/156109

See results here.

@gonidelis gonidelis enabled auto-merge (squash) March 10, 2026 03:21
_CCCL_NVTX_RANGE_SCOPE("cub::DeviceRunLengthEncode::NonTrivialRuns");

using global_offset_t = detail::choose_signed_offset_t<NumItemsT>;
using equality_op = ::cuda::std::equal_to<>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is fine, because it always just returns a bool and does not promote integers

_CCCL_NVTX_RANGE_SCOPE("cub::DeviceRunLengthEncode::Encode");

using equality_op = ::cuda::std::equal_to<>;
using reduction_op = ::cuda::std::plus<>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: Should this rather be

Suggested change
using reduction_op = ::cuda::std::plus<>;
using reduction_op = ::cuda::std::plus<length_t>;

Otherwise this will always promote offset_t to a larger integer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-env overload where I guess this implementation comes from also uses plus<>. I agree that that's maybe not what we want, since plus<length_t> does not promote and influence the accumulator type. But this should be addressed in a separate PR.

Comment on lines +155 to +156
cudaStream_t custom_stream;
REQUIRE(cudaSuccess == cudaStreamCreate(&custom_stream));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Please use cuda::stream so it gets higher test coverage. If you want to pass a cudaStream_tyou can always call .get() (I think) on the cuda::stream to get the raw underlying stream.

This suggestion applies generally for all env-overload PRs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonidelis since the PR auto-merged, please create a note or a tracking issue to replace all manual stream creation by cuda::stream in our unit tests.

@gonidelis gonidelis merged commit defbe55 into NVIDIA:main Mar 10, 2026
268 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add env-based API for cub::DeviceRunLengthEncode

3 participants