You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
--static Build RCCL as a static library instead of shared library
64
64
-t|--tests_build Build rccl unit tests, but do not run
65
65
--time-trace Plot the build time of RCCL (requires `ninja-build` package installed on the system)
66
+
--rocshmem Build with rocSHMEM support (for GDA AllToAll)
66
67
--verbose Show compile commands
67
68
```
68
69
@@ -126,6 +127,35 @@ will run only AllReduce correctness tests with float16 datatype. A list of avail
126
127
There are also other performance and error-checking tests for RCCL. These are maintained separately at https://github.com/ROCm/rccl-tests.
127
128
See the rccl-tests README for more information on how to build and run those tests.
128
129
130
+
## rocSHMEM support
131
+
132
+
RCCL can use rocSHMEM's GPU Direct Async (GDA) backend to accelerate the **AllToAll** collective on supported multi-node setups. This is the only collective that currently uses rocSHMEM GDA inside RCCL.
133
+
134
+
Please consult the [rocSHMEM documentation](https://rocm.docs.amd.com/projects/rocSHMEM/en/latest/install.html#gda-nic-dependencies) to see which NICs and drivers are required for GDA alltoall support.
135
+
136
+
**Building with rocSHMEM**
137
+
138
+
- Using the install script:
139
+
```shell
140
+
./install.sh --rocshmem
141
+
```
142
+
If the rocSHMEM submodule is present (`ext-src/rocSHMEM`), it will be built and linked automatically. To use a pre-built rocSHMEM installation instead, set`ROCSHMEM_INSTALL_DIR` to the install prefix before running the script.
Users must set the following environment variables:
153
+
154
+
- **`RCCL_ROCSHMEM_ENABLE`** (default: `1`): Set to `0` to disable rocSHMEM usage in RCCL.
155
+
- **`RCCL_ROCSHMEM_THRESHOLD`** (default: `262144` bytes): Maximum AllToAll message size (in bytes) for which the GDA path is used. The GDA path is only considered when this value is ≤ 1 MiB (1048576); larger thresholds fall back to the standard AllToAll implementation.
156
+
157
+
The GDA AllToAll path is selected only when all of the following hold: rocSHMEM is enabled at build and runtime, the GPU architecture is gfx942 (e.g. MI300X), the job is multi-node with 8 GPUs per node, and the AllToAll message size is ≤ `RCCL_ROCSHMEM_THRESHOLD`.
158
+
129
159
## Library and API Documentation
130
160
131
161
Please refer to the [RCCL Documentation Site](https://rocm.docs.amd.com/projects/rccl/en/latest/) for current documentation.
0 commit comments