Add env SegmentedReduce (non fixed-size overloads)#7795
Add env SegmentedReduce (non fixed-size overloads)#7795gonidelis merged 18 commits intoNVIDIA:mainfrom
Conversation
2aa7447 to
d442948
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This comment has been minimized.
This comment has been minimized.
83c6791 to
535da7d
Compare
This comment has been minimized.
This comment has been minimized.
535da7d to
7a15fdf
Compare
|
I removed the helper underlying implementation function for fixed segment size overloads as it pre required knowledge of the |
|
adding missing unit tests just now |
4d49de7 to
44d9d80
Compare
44d9d80 to
f4048ec
Compare
This comment has been minimized.
This comment has been minimized.
miscco
left a comment
There was a problem hiding this comment.
Looks good.
@bernhardmgruber I observe that we are really loose with the naming conventions We have InitValueT, init_value_t, init_t, no alias at all Same for AccumT and so on
We really should be more consistent
f4048ec to
e62f191
Compare
This comment has been minimized.
This comment has been minimized.
…r to common impl
- Add private segmented_reduce_impl that centralizes determinism
validation (static_assert rejecting gpu_to_gpu), dispatch_with_env,
and tuning extraction, eliminating boilerplate across all env overloads
- Refactor Reduce, Sum, Min, Max env overloads to delegate to
segmented_reduce_impl
- Add new env overloads for ArgMin and ArgMax with full documentation
including literalinclude snippet tags
- Rewrite env_api tests covering all 6 APIs (Reduce, Sum, Min, Max,
ArgMin, ArgMax) with determinism and stream_ref acceptance tests
- Unify _env.cu and _env_launch.cu into a single _env.cu test file
with default env, launch wrapper, custom stream, and tuning tests
…ed extra redundant logic for no reason
- Add missing non-overlap precondition to env ArgMin and ArgMax docs - Reorder env tests: group all env tests before custom stream tests - Add not_guaranteed determinism test for Reduce env API
3e66fa1 to
20a1056
Compare
This comment has been minimized.
This comment has been minimized.
bernhardmgruber
left a comment
There was a problem hiding this comment.
The header looks fine, except one issue:
20a1056 to
81312a6
Compare
🥳 CI Workflow Results🟩 Finished in 1h 10m: Pass: 100%/249 | Total: 2d 15h | Max: 53m 29s | Hits: 97%/159713See results here. |
Split (1/2)
Adds env based overloads for non fixed-size segments
DeviceSegmentedReduce::*algorithmsMerge before #8097
Segmented Reduce is inherently
run_to_rundeterministic thus this is the largest deterministic guarantee allowed. If you believe there at some point can be an a perf optimization that will ruin this contract let me know and we will remove this promise in this PR. Otherwise we stay bound to that.