Commit 81d4be2
authored
Add env SegmentedReduce (non fixed-size overloads) (#7795)
* Add env SegmentedReduce
* Add env overloads for DeviceSegmentedReduce ArgMin/ArgMax and refactor to common impl
- Add private segmented_reduce_impl that centralizes determinism
validation (static_assert rejecting gpu_to_gpu), dispatch_with_env,
and tuning extraction, eliminating boilerplate across all env overloads
- Refactor Reduce, Sum, Min, Max env overloads to delegate to
segmented_reduce_impl
- Add new env overloads for ArgMin and ArgMax with full documentation
including literalinclude snippet tags
- Rewrite env_api tests covering all 6 APIs (Reduce, Sum, Min, Max,
ArgMin, ArgMax) with determinism and stream_ref acceptance tests
- Unify _env.cu and _env_launch.cu into a single _env.cu test file
with default env, launch wrapper, custom stream, and tuning tests
* Add env overloads for fixed size segment APIs
* add env api literalinclude example just for Reduce and remove non guaranteed api test
* Remove fixed_size_segmented_reduce_impl underlying function as it added extra redundant logic for no reason
* Add unit tests for fixed-seg-size overloads and argmin argmax
* Fix GCC 7 auto deduction in generic lambda for fixed-size ArgMin/ArgMax env overloads
* Use __query_result_or_t to query tuning environment
* Static assert on numeric_limits specialization
* reviews
* Use explicit types for plus/minimum/maximum in env overloads to avoid integer promotion
* Add cuda::stream-based env to all env API tests
* Turn stream.wait() to stream.sync()
* Remove fixed-size overloads to simplify PR
* Fix breaking change from specializations
* Sum(env...) was already there, reintroduce it with the same constrains
* Address review nits for DeviceSegmentedReduce env overloads
- Add missing non-overlap precondition to env ArgMin and ArgMax docs
- Reorder env tests: group all env tests before custom stream tests
- Add not_guaranteed determinism test for Reduce env API
* Add not_guaranteed query
* Docs nits
* Add run_to_run and not_guaranteed api tests1 parent 50a1189 commit 81d4be2
File tree
3 files changed
+1587
-142
lines changed- cub
- cub/device
- test
3 files changed
+1587
-142
lines changed
0 commit comments