Skip to content

Commit 67f32b7

Browse files
Updating RCCL readme to include rocSHMEM GDA info
* Updating RCCL readme to include rocSHMEM GDA info * Add links to rocSHMEM docs [rocm-systems] ROCm/rocm-systems#3186 (commit f43b99e)
1 parent f48bb06 commit 67f32b7

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ RCCL build & installation helper script
6363
--static Build RCCL as a static library instead of shared library
6464
-t|--tests_build Build rccl unit tests, but do not run
6565
--time-trace Plot the build time of RCCL (requires `ninja-build` package installed on the system)
66+
--rocshmem Build with rocSHMEM support (for GDA AllToAll)
6667
--verbose Show compile commands
6768
```
6869
@@ -126,6 +127,35 @@ will run only AllReduce correctness tests with float16 datatype. A list of avail
126127
There are also other performance and error-checking tests for RCCL. These are maintained separately at https://github.com/ROCm/rccl-tests.
127128
See the rccl-tests README for more information on how to build and run those tests.
128129
130+
## rocSHMEM support
131+
132+
RCCL can use rocSHMEM's GPU Direct Async (GDA) backend to accelerate the **AllToAll** collective on supported multi-node setups. This is the only collective that currently uses rocSHMEM GDA inside RCCL.
133+
134+
Please consult the [rocSHMEM documentation](https://rocm.docs.amd.com/projects/rocSHMEM/en/latest/install.html#gda-nic-dependencies) to see which NICs and drivers are required for GDA alltoall support.
135+
136+
**Building with rocSHMEM**
137+
138+
- Using the install script:
139+
```shell
140+
./install.sh --rocshmem
141+
```
142+
If the rocSHMEM submodule is present (`ext-src/rocSHMEM`), it will be built and linked automatically. To use a pre-built rocSHMEM installation instead, set `ROCSHMEM_INSTALL_DIR` to the install prefix before running the script.
143+
- Using CMake:
144+
```shell
145+
cmake -DENABLE_ROCSHMEM=ON ..
146+
# Optional: use an existing rocSHMEM install
147+
cmake -DENABLE_ROCSHMEM=ON -DROCSHMEM_INSTALL_DIR=/path/to/rocshmem ..
148+
```
149+
150+
**Runtime behavior**
151+
152+
Users must set the following environment variables:
153+
154+
- **`RCCL_ROCSHMEM_ENABLE`** (default: `1`): Set to `0` to disable rocSHMEM usage in RCCL.
155+
- **`RCCL_ROCSHMEM_THRESHOLD`** (default: `262144` bytes): Maximum AllToAll message size (in bytes) for which the GDA path is used. The GDA path is only considered when this value is ≤ 1 MiB (1048576); larger thresholds fall back to the standard AllToAll implementation.
156+
157+
The GDA AllToAll path is selected only when all of the following hold: rocSHMEM is enabled at build and runtime, the GPU architecture is gfx942 (e.g. MI300X), the job is multi-node with 8 GPUs per node, and the AllToAll message size is ≤ `RCCL_ROCSHMEM_THRESHOLD`.
158+
129159
## Library and API Documentation
130160

131161
Please refer to the [RCCL Documentation Site](https://rocm.docs.amd.com/projects/rccl/en/latest/) for current documentation.

0 commit comments

Comments
 (0)