Commit 42c9090
authored
Add
Implements an `AllReduce` collective on top of the `AllGather` implementation. This may or may not be an acceptable long-term implementation given the performance characteristics it delivers, see the `AllReduce` class docs for details, but it makes the implementation certainly easier to maintain.
```
/**
* @brief AllReduce collective.
*
* The current implementation is built using `coll::AllGather` and performs
* the reduction locally after allgather completes. Considering `R` is the number of
* ranks, and `N` is the number of bytes of data, per rank this incurs `O(R * N)` bytes of
* memory consumption and `O(R)` communication operations.
*
* Semantics:
* - Each rank calls `insert` exactly once to contribute data to the reduction.
* - Once all ranks call `insert`, `wait_and_extract` returns the
* globally-reduced `PackedData`.
*
* The actual reduction is implemented via a type-erased `ReduceOperator` that is
* supplied at construction time. Helper factories such as
* `detail::make_reduce_operator` (defaults to host-side) or
* `detail::make_device_reduce_operator` (device-side) can be used to build
* element-wise reductions over contiguous arrays.
*/
```
Both host-side and device-side reductions are supported, whether it happens on host or device depends on the operator being passed, for built-in operators there are `HostReduceOp` and `DeviceReduceOp`, and custom operators require defining a `ReduceOperator` with `ReduceOperatorType::Host` or `ReduceOperatorType::Device` to determine where it runs. Finally, the implementation ensures data is properly located before executing the reduction, any device data is moved to host when running a host reduction, and host data is moved to device when running a device reduction.
Authors:
- Peter Andreas Entschev (https://github.com/pentschev)
- Bradley Dice (https://github.com/bdice)
- Niranda Perera (https://github.com/nirandaperera)
Approvers:
- Kyle Edwards (https://github.com/KyleFromNVIDIA)
- James Lamb (https://github.com/jameslamb)
- Lawrence Mitchell (https://github.com/wence-)
- Mads R. B. Kristensen (https://github.com/madsbk)
URL: rapidsai#683AllReduce collective (rapidsai#683)1 parent b527a08 commit 42c9090
File tree
27 files changed
+1616
-51
lines changed- cpp
- include/rapidsmpf
- coll
- memory
- streaming/coll
- src
- coll
- memory
- streaming/coll
- tests
- streaming
- python/rapidsmpf/rapidsmpf
- coll
- streaming/coll
- tests
27 files changed
+1616
-51
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
170 | | - | |
| 170 | + | |
| 171 | + | |
171 | 172 | | |
172 | 173 | | |
173 | 174 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
| 30 | + | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
557 | 557 | | |
558 | 558 | | |
559 | 559 | | |
560 | | - | |
| 560 | + | |
0 commit comments