Skip to content

Commit 42c9090

Browse files
authored
Add AllReduce collective (rapidsai#683)
Implements an `AllReduce` collective on top of the `AllGather` implementation. This may or may not be an acceptable long-term implementation given the performance characteristics it delivers, see the `AllReduce` class docs for details, but it makes the implementation certainly easier to maintain. ``` /** * @brief AllReduce collective. * * The current implementation is built using `coll::AllGather` and performs * the reduction locally after allgather completes. Considering `R` is the number of * ranks, and `N` is the number of bytes of data, per rank this incurs `O(R * N)` bytes of * memory consumption and `O(R)` communication operations. * * Semantics: * - Each rank calls `insert` exactly once to contribute data to the reduction. * - Once all ranks call `insert`, `wait_and_extract` returns the * globally-reduced `PackedData`. * * The actual reduction is implemented via a type-erased `ReduceOperator` that is * supplied at construction time. Helper factories such as * `detail::make_reduce_operator` (defaults to host-side) or * `detail::make_device_reduce_operator` (device-side) can be used to build * element-wise reductions over contiguous arrays. */ ``` Both host-side and device-side reductions are supported, whether it happens on host or device depends on the operator being passed, for built-in operators there are `HostReduceOp` and `DeviceReduceOp`, and custom operators require defining a `ReduceOperator` with `ReduceOperatorType::Host` or `ReduceOperatorType::Device` to determine where it runs. Finally, the implementation ensures data is properly located before executing the reduction, any device data is moved to host when running a host reduction, and host data is moved to device when running a device reduction. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Bradley Dice (https://github.com/bdice) - Niranda Perera (https://github.com/nirandaperera) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - James Lamb (https://github.com/jameslamb) - Lawrence Mitchell (https://github.com/wence-) - Mads R. B. Kristensen (https://github.com/madsbk) URL: rapidsai#683
1 parent b527a08 commit 42c9090

27 files changed

+1616
-51
lines changed

cpp/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,8 @@ target_link_options(maybe_asan INTERFACE "$<$<BOOL:${RAPIDSMPF_ASAN}>:-fsanitize
167167

168168
add_library(
169169
rapidsmpf
170-
src/allgather/allgather.cpp
170+
src/coll/allgather.cpp
171+
src/coll/allreduce.cpp
171172
src/bootstrap/bootstrap.cpp
172173
src/bootstrap/file_backend.cpp
173174
src/bootstrap/utils.cpp

cpp/include/rapidsmpf/allgather/allgather.hpp renamed to cpp/include/rapidsmpf/coll/allgather.hpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@
2727
#include <rapidsmpf/statistics.hpp>
2828

2929
/**
30-
* @namespace rapidsmpf::allgather
31-
* @brief Allgather communication interfaces.
30+
* @namespace rapidsmpf::coll
31+
* @brief Collective communication interfaces.
3232
*
3333
* An allgather service for distributed communication where all ranks collect
3434
* data from all other ranks.
3535
*/
36-
namespace rapidsmpf::allgather {
36+
namespace rapidsmpf::coll {
3737
namespace detail {
3838

3939
/// @brief Type alias for chunk identifiers.
@@ -557,4 +557,4 @@ class AllGather {
557557
std::vector<std::unique_ptr<Communicator::Future>> receive_futures_{};
558558
};
559559

560-
} // namespace rapidsmpf::allgather
560+
} // namespace rapidsmpf::coll

0 commit comments

Comments
 (0)