@@ -51,7 +51,7 @@ Runtime querying of ROCm support in Open MPI
5151--------------------------------------------
5252
5353Querying the availability of ROCm support in Open MPI at runtime is
54- possible through the memory- kind info object, see ::ref: :`memory-kind `
54+ possible through the memory allocation kind info object, see ::ref: :`memkind `
5555page for details.
5656
5757In addition, starting with Open MPI v5.0.0 :ref: `MPIX_Query_rocm_support(3)
@@ -66,15 +66,15 @@ function, the code needs to include ``mpi-ext.h``. Note that
6666/////////////////////////////////////////////////////////////////////////
6767
6868Running single node jobs with ROCm support
69- -------------------------------------------------------
69+ ------------------------------------------
7070
7171The user has multiple options for running an Open MPI job with GPU support
7272in a single node scenario:
7373
7474* the default shared memory component ``btl/sm `` has support for
7575 accelerators, will use however by default a bounce buffer on the CPU
7676 for data transfers. Hence, while this works, it will not be able to
77- take advantage of the high-speed GPU-to-GPU InfinityFabric (TM)
77+ take advantage of the high-speed GPU-to-GPU InfinityFabric
7878 interconnect (if available).
7979
8080* to use the high-speed GPU-to-GPU interconnect within a node, the user has to
@@ -166,7 +166,7 @@ ROCm support in Open MPI with libfabric
166166---------------------------------------
167167
168168Some network interconnects are supported through the libfabric library.
169- Configurating libfabric and Open MPI with ROCm support looks something like:
169+ Configuring libfabric and Open MPI with ROCm support looks something like:
170170
171171.. code-block :: sh
172172
@@ -191,7 +191,7 @@ There are two mechanism for using libfabric and Open MPI with ROCm support.
191191
192192* Specifying the ``mtl/ofi `` component is sufficient to take advantage
193193 of the ROCm support in the libraries. In this case, both intra- and
194- inter-node communication will be performed by the libfabric. In
194+ inter-node communication will be performed by the libfabric library . In
195195 order to ensure that the application will make use of the shared
196196 memory provider for intra-node communication and the network
197197 interconnect specific provider for inter-node communication, the
@@ -211,7 +211,8 @@ There are two mechanism for using libfabric and Open MPI with ROCm support.
211211
212212.. code-block :: sh
213213
214- # Force using the ofi mtl component
214+ # Use the ofi btl for inter-node and sm btl
215+ # for intra-node communication
215216 mpirun --mca pml ob1 --mca btl ofi,sm,tcp,self \
216217 --mca smsc_accelerator_priority 80 \
217218 -n 64 ./< my_executable>
@@ -227,7 +228,7 @@ The ``coll/accelerator`` component supports collective operations on
227228ROCm device buffers for many commonly used collective
228229operations. The component works by copying data into a temporary host
229230buffer, executing the collective operation on the host buffer, and
230- copying the data back to the device buffer at completion. This
231+ copying the result back to the device buffer at completion. This
231232component will lead to adequate performance for short to medium data
232233sizes, but performance is often suboptimal especially for large reduction
233234operations.
0 commit comments