Skip to content

Commit 1b0d86a

Browse files
committed
some minor updates
Signed-off-by: Edgar Gabriel <[email protected]>
1 parent eeb410e commit 1b0d86a

File tree

3 files changed

+22
-16
lines changed

3 files changed

+22
-16
lines changed

docs/tuning-apps/accelerators/initialize.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ that is set by Open MPI at launch time and can be retrieved before
1616
``MPI_Init``. An example code sample using the HIP programming model
1717
looks as follows:
1818

19-
.. code-block:: sh
19+
.. code-block:: c
2020
2121
int num_devices;
2222
hipGetDeviceCount(&num_devices);
2323
assert (num_devices > 0);
2424
2525
char* ompi_local_rank = getenv("OMPI_COMM_WORLD_LOCAL_RANK");
2626
if (nullptr != ompi_local_rank) {
27-
hipSetDevice(atoi(ompi_local_rank % num_devices));
27+
hipSetDevice(atoi(ompi_local_rank) % num_devices);
2828
}
2929
3030
MPI_Init (&argc, &argv);

docs/tuning-apps/accelerators/memkind.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
Support for Memory-kind Info Objects
22
====================================
33

4-
`MPI version 4.1. <https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf>`_ introduced the notion of memory-kinds, which allow an
4+
`MPI version 4.1. <https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf>`_
5+
introduced the notion of memory allocation kinds, which allow an
56
application to specify what memory types it plans to use, and to query
67
what memory types are supported by the MPI library in a portable
78
manner. In addition, the application can place restrictions on certain
89
objects such as creating a separate communicator for using with
910
host-memory and a communicator that will be used with device memory
1011
only. This approach allows the MPI library to perform certain
1112
optimizations, such as bypassing checking the memory-type of buffer
12-
pointers. Please refer to the MPI specification as well as the
13-
`Memory Allocation Kinds Side Document <https://www.mpi-forum.org/docs/sidedocs/mem-alloc10.pdf>`_ for more details and examples.
13+
pointers. Please refer to the MPI specification as well as the `Memory
14+
Allocation Kinds Side Document
15+
<https://www.mpi-forum.org/docs/sidedocs/mem-alloc10.pdf>`_ for more
16+
details and examples.
1417

1518
Open MPI starting from version 6.0.0 supports the following values for the memory allocation kind Info object:
1619

@@ -19,21 +22,22 @@ Open MPI starting from version 6.0.0 supports the following values for the memor
1922
* cuda:device
2023
* cuda:host
2124
* cuda:managed
22-
* level_zero:host
2325
* level_zero:device
26+
* level_zero:host
2427
* level_zero:shared
2528
* rocm:device
2629
* rocm:host
2730
* rocm:managed
2831

29-
.. note:: Support for accelerator memory-kind info objects will depend
30-
on the accelerator support compiled into Open MPI.
32+
.. note:: Support for accelerator memory allocation kind info objects
33+
will depend on the accelerator support compiled into Open
34+
MPI.
3135

3236

3337
Passing memory-kind info to mpiexec
3438
===================================
3539

36-
The following example demonstrates how to pass memory-allocation kind
40+
The following example demonstrates how to pass memory allocation kind
3741
information to Open MPI at application launch:
3842

3943
.. code:: sh
@@ -57,3 +61,4 @@ communicator will only be used for ROCm device buffers:
5761
5862
MPI_Comm comm_dup
5963
MPI_Comm_dup_with_info (MPI_COMM_WORLD, info_assert, &comm_dup);
64+
...

docs/tuning-apps/accelerators/rocm.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Runtime querying of ROCm support in Open MPI
5151
--------------------------------------------
5252

5353
Querying the availability of ROCm support in Open MPI at runtime is
54-
possible through the memory-kind info object, see ::ref::`memory-kind`
54+
possible through the memory allocation kind info object, see ::ref::`memkind`
5555
page for details.
5656

5757
In addition, starting with Open MPI v5.0.0 :ref:`MPIX_Query_rocm_support(3)
@@ -66,15 +66,15 @@ function, the code needs to include ``mpi-ext.h``. Note that
6666
/////////////////////////////////////////////////////////////////////////
6767

6868
Running single node jobs with ROCm support
69-
-------------------------------------------------------
69+
------------------------------------------
7070

7171
The user has multiple options for running an Open MPI job with GPU support
7272
in a single node scenario:
7373

7474
* the default shared memory component ``btl/sm`` has support for
7575
accelerators, will use however by default a bounce buffer on the CPU
7676
for data transfers. Hence, while this works, it will not be able to
77-
take advantage of the high-speed GPU-to-GPU InfinityFabric (TM)
77+
take advantage of the high-speed GPU-to-GPU InfinityFabric
7878
interconnect (if available).
7979

8080
* to use the high-speed GPU-to-GPU interconnect within a node, the user has to
@@ -166,7 +166,7 @@ ROCm support in Open MPI with libfabric
166166
---------------------------------------
167167

168168
Some network interconnects are supported through the libfabric library.
169-
Configurating libfabric and Open MPI with ROCm support looks something like:
169+
Configuring libfabric and Open MPI with ROCm support looks something like:
170170

171171
.. code-block:: sh
172172
@@ -191,7 +191,7 @@ There are two mechanism for using libfabric and Open MPI with ROCm support.
191191

192192
* Specifying the ``mtl/ofi`` component is sufficient to take advantage
193193
of the ROCm support in the libraries. In this case, both intra- and
194-
inter-node communication will be performed by the libfabric. In
194+
inter-node communication will be performed by the libfabric library. In
195195
order to ensure that the application will make use of the shared
196196
memory provider for intra-node communication and the network
197197
interconnect specific provider for inter-node communication, the
@@ -211,7 +211,8 @@ There are two mechanism for using libfabric and Open MPI with ROCm support.
211211

212212
.. code-block:: sh
213213
214-
# Force using the ofi mtl component
214+
# Use the ofi btl for inter-node and sm btl
215+
# for intra-node communication
215216
mpirun --mca pml ob1 --mca btl ofi,sm,tcp,self \
216217
--mca smsc_accelerator_priority 80 \
217218
-n 64 ./<my_executable>
@@ -227,7 +228,7 @@ The ``coll/accelerator`` component supports collective operations on
227228
ROCm device buffers for many commonly used collective
228229
operations. The component works by copying data into a temporary host
229230
buffer, executing the collective operation on the host buffer, and
230-
copying the data back to the device buffer at completion. This
231+
copying the result back to the device buffer at completion. This
231232
component will lead to adequate performance for short to medium data
232233
sizes, but performance is often suboptimal especially for large reduction
233234
operations.

0 commit comments

Comments
 (0)