Skip to content

Add env SegmentedReduce (non fixed-size overloads)#7795

Merged
gonidelis merged 18 commits intoNVIDIA:mainfrom
gonidelis:segmented_redude_env
Mar 25, 2026
Merged

Add env SegmentedReduce (non fixed-size overloads)#7795
gonidelis merged 18 commits intoNVIDIA:mainfrom
gonidelis:segmented_redude_env

Conversation

@gonidelis
Copy link
Copy Markdown
Member

@gonidelis gonidelis commented Feb 25, 2026

Split (1/2)

Adds env based overloads for non fixed-size segments DeviceSegmentedReduce::* algorithms

Merge before #8097

Segmented Reduce is inherently run_to_run deterministic thus this is the largest deterministic guarantee allowed. If you believe there at some point can be an a perf optimization that will ruin this contract let me know and we will remove this promise in this PR. Otherwise we stay bound to that.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Feb 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Feb 25, 2026
@gonidelis gonidelis force-pushed the segmented_redude_env branch from 2aa7447 to d442948 Compare February 25, 2026 20:29
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Feb 25, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@gonidelis gonidelis marked this pull request as ready for review February 26, 2026 01:49
@gonidelis gonidelis requested a review from a team as a code owner February 26, 2026 01:49
@gonidelis gonidelis requested a review from pauleonix February 26, 2026 01:49
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Feb 26, 2026
@github-actions

This comment has been minimized.

@gonidelis gonidelis force-pushed the segmented_redude_env branch from 83c6791 to 535da7d Compare March 5, 2026 14:47
@github-actions

This comment has been minimized.

@gonidelis gonidelis force-pushed the segmented_redude_env branch from 535da7d to 7a15fdf Compare March 9, 2026 21:57
@gonidelis
Copy link
Copy Markdown
Member Author

I removed the helper underlying implementation function for fixed segment size overloads as it pre required knowledge of the AccumT and added extra logic that was unnecessary. Non fixed-size overloads still do use the *_impl function though

@gonidelis
Copy link
Copy Markdown
Member Author

adding missing unit tests just now

@github-actions

This comment has been minimized.

@gonidelis gonidelis enabled auto-merge (squash) March 10, 2026 03:18
Copy link
Copy Markdown
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@bernhardmgruber I observe that we are really loose with the naming conventions We have InitValueT, init_value_t, init_t, no alias at all Same for AccumT and so on

We really should be more consistent

@gonidelis
Copy link
Copy Markdown
Member Author

@miscco #7974 (comment) ok?

@gonidelis gonidelis force-pushed the segmented_redude_env branch from f4048ec to e62f191 Compare March 10, 2026 15:33
@github-actions

This comment has been minimized.

…r to common impl

  - Add private segmented_reduce_impl that centralizes determinism
    validation (static_assert rejecting gpu_to_gpu), dispatch_with_env,
    and tuning extraction, eliminating boilerplate across all env overloads
  - Refactor Reduce, Sum, Min, Max env overloads to delegate to
    segmented_reduce_impl
  - Add new env overloads for ArgMin and ArgMax with full documentation
    including literalinclude snippet tags
  - Rewrite env_api tests covering all 6 APIs (Reduce, Sum, Min, Max,
    ArgMin, ArgMax) with determinism and stream_ref acceptance tests
  - Unify _env.cu and _env_launch.cu into a single _env.cu test file
    with default env, launch wrapper, custom stream, and tuning tests
* Use explicit types for plus/minimum/maximum in env overloads to avoid integer promotion
* Add cuda::stream-based env to all env API tests
  - Add missing non-overlap precondition to env ArgMin and ArgMax docs
  - Reorder env tests: group all env tests before custom stream tests
  - Add not_guaranteed determinism test for Reduce env API
@gonidelis gonidelis force-pushed the segmented_redude_env branch from 3e66fa1 to 20a1056 Compare March 25, 2026 07:27
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header looks fine, except one issue:

@gonidelis gonidelis disabled auto-merge March 25, 2026 09:27
@gonidelis gonidelis force-pushed the segmented_redude_env branch from 20a1056 to 81312a6 Compare March 25, 2026 09:29
@gonidelis gonidelis enabled auto-merge (squash) March 25, 2026 09:30
@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 10m: Pass: 100%/249 | Total: 2d 15h | Max: 53m 29s | Hits: 97%/159713

See results here.

@gonidelis gonidelis merged commit 81d4be2 into NVIDIA:main Mar 25, 2026
267 of 268 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants