@@ -73,146 +73,70 @@ to send bytes across different types underlying networks. The ``tcp``
7373``btl ``, for example, sends messages across TCP-based networks; the
7474``ucx `` ``pml `` sends messages across InfiniBand-based networks.
7575
76+ MCA parameter notes
77+ -------------------
78+
7679Each component typically has some tunable parameters that can be
77- changed at run-time. Use the ``ompi_info `` command to check a component
78- to see what its tunable parameters are. For example:
80+ changed at run-time. Use the :ref: `ompi_info(1) <man1-ompi_info >`
81+ command to check a component to see what its tunable parameters are.
82+ For example:
7983
8084.. code-block :: sh
8185
8286 shell$ ompi_info --param btl tcp
8387
8488 shows some of the parameters (and default values) for the ``tcp `` ``btl ``
85- component (use ``--level `` to show *all * the parameters; see below).
86-
87- Note that ``ompi_info `` only shows a small number a component's MCA
88- parameters by default. Each MCA parameter has a "level" value from 1
89- to 9, corresponding to the MPI-3 MPI_T tool interface levels. In Open
90- MPI, we have interpreted these nine levels as three groups of three:
91-
92- #. End user / basic
93- #. End user / detailed
94- #. End user / all
95- #. Application tuner / basic
96- #. Application tuner / detailed
97- #. Application tuner / all
98- #. MPI/OpenSHMEM developer / basic
99- #. MPI/OpenSHMEM developer / detailed
100- #. MPI/OpenSHMEM developer / all
101-
102- Here's how the three sub-groups are defined:
103-
104- #. End user: Generally, these are parameters that are required for
105- correctness, meaning that someone may need to set these just to
106- get their MPI/OpenSHMEM application to run correctly.
107- #. Application tuner: Generally, these are parameters that can be
108- used to tweak MPI application performance.
109- #. MPI/OpenSHMEM developer: Parameters that either don't fit in the
110- other two, or are specifically intended for debugging /
111- development of Open MPI itself.
112-
113- Each sub-group is broken down into three classifications:
114-
115- #. Basic: For parameters that everyone in this category will want to
116- see.
117- #. Detailed: Parameters that are useful, but you probably won't need
118- to change them often.
119- #. All: All other parameters -- probably including some fairly
120- esoteric parameters.
121-
122- To see *all * available parameters for a given component, specify that
123- ompi_info should use level 9:
124-
125- .. code-block :: sh
126-
127- shell$ ompi_info --param btl tcp --level 9
128-
129- .. error :: TODO The following content seems redundant with the FAQ.
130- Additionally, information about how to set MCA params should be
131- prominently documented somewhere that is easy for users to find --
132- not buried here in the developer's section.
133-
134- These values can be overridden at run-time in several ways. At
135- run-time, the following locations are examined (in order) for new
136- values of parameters:
137-
138- #. ``PREFIX/etc/openmpi-mca-params.conf ``:
139- This file is intended to set any system-wide default MCA parameter
140- values -- it will apply, by default, to all users who use this Open
141- MPI installation. The default file that is installed contains many
142- comments explaining its format.
143-
144- #. ``$HOME/.openmpi/mca-params.conf ``:
145- If this file exists, it should be in the same format as
146- ``PREFIX/etc/openmpi-mca-params.conf ``. It is intended to provide
147- per-user default parameter values.
148-
149- #. environment variables of the form ``OMPI_MCA_<name> `` set equal to a
150- ``VALUE ``:
151-
152- Where ``<name> `` is the name of the parameter. For example, set the
153- variable named ``OMPI_MCA_btl_tcp_frag_size `` to the value 65536
154- (Bourne-style shells):
155-
156- .. code-block :: sh
157-
158- shell$ OMPI_MCA_btl_tcp_frag_size=65536
159- shell$ export OMPI_MCA_btl_tcp_frag_size
160-
161- .. error :: TODO Do we need content here about PMIx and PRTE env vars?
162-
163- #. the ``mpirun ``/``oshrun `` command line: ``--mca NAME VALUE ``
164-
165- Where ``<name> `` is the name of the parameter. For example:
166-
167- .. code-block :: sh
168-
169- shell$ mpirun --mca btl_tcp_frag_size 65536 -n 2 hello_world_mpi
170-
171- .. error :: TODO Do we need content here about PMIx and PRTE MCA vars
172- and corresponding command line switches?
173-
174- These locations are checked in order. For example, a parameter value
175- passed on the ``mpirun `` command line will override an environment
176- variable; an environment variable will override the system-wide
177- defaults.
178-
179- Each component typically activates itself when relevant. For example,
180- the usNIC component will detect that usNIC devices are present and
181- will automatically be used for MPI communications. The Slurm
182- component will automatically detect when running inside a Slurm job
183- and activate itself. And so on.
184-
185- Components can be manually activated or deactivated if necessary, of
186- course. The most common components that are manually activated,
187- deactivated, or tuned are the ``btl `` components -- components that are
188- used for MPI point-to-point communications on many types common
189- networks.
190-
191- For example, to *only * activate the ``tcp `` and ``self `` (process loopback)
192- components are used for MPI communications, specify them in a
193- comma-delimited list to the ``btl `` MCA parameter:
194-
195- .. code-block :: sh
196-
197- shell$ mpirun --mca btl tcp,self hello_world_mpi
198-
199- To add shared memory support, add ``sm `` into the command-delimited list
200- (list order does not matter):
201-
202- .. code-block :: sh
203-
204- shell$ mpirun --mca btl tcp,sm,self hello_world_mpi
205-
206- .. note :: There used to be a ``vader`` ``btl`` component for shared
207- memory support; it was renamed to ``sm `` in Open MPI v5.0.0,
208- but the alias ``vader `` still works as well.
209-
210- To specifically deactivate a specific component, the comma-delimited
211- list can be prepended with a ``^ `` to negate it:
212-
213- .. code-block :: sh
214-
215- shell$ mpirun --mca btl ^tcp hello_mpi_world
216-
217- The above command will use any other ``btl `` component other than the
218- ``tcp `` component.
89+ component (use ``--all `` or ``--level 9 `` to show *all * the parameters).
90+
91+ Note that ``ompi_info `` (without ``--all `` or a specified level) only
92+ shows a small number a component's MCA parameters by default. Each
93+ MCA parameter has a "level" value from 1 to 9, corresponding to the
94+ MPI-3 MPI_T tool interface levels. :ref: `See the LEVELS section in
95+ the ompi_info(1) man page <man1-ompi_info-levels>` for an explanation
96+ of the levels and how they correspond to Open MPI's code.
97+
98+ Here's rules of thumb to keep in mind when using Open MPI's levels:
99+
100+ * Levels 1-3:
101+
102+ * These levels should contain only a few MCA parameters.
103+ * Generally, only put MCA parameters in these levels that matter to
104+ users who just need to *run * Open MPI applications (and don't
105+ know/care anything about MPI). Examples (these are not
106+ comprehensive):
107+
108+ * Selection of which network interfaces to use.
109+ * Selection of which MCA components to use.
110+ * Selective disabling of warning messages (e.g., show warning
111+ message XYZ unless a specific MCA parameter is set, which
112+ disables showing that warning message).
113+ * Enabling additional stderr logging verbosity. This allows a
114+ user to run with this logging enabled, and then use that output
115+ to get technical assistance.
116+
117+ * Levels 4-6:
118+
119+ * These levels should contain any other MCA parameters that are
120+ useful to expose to end users.
121+ * There is an expectation that "power users" will utilize these MCA
122+ parameters |mdash | e.g., those who are trying to tune the system
123+ and extract more performance.
124+ * Here's some examples of MCA parameters suitable for these levels
125+ (these are not comprehensive):
126+
127+ * When you could have hard-coded a constant size of a resource
128+ (e.g., a resource pool size or buffer length), make it an MCA
129+ parameter instead.
130+ * When there are multiple different algorithms available for a
131+ particular operation, code them all up and provide an MCA
132+ parameter to let the user select between them.
133+
134+ * Levels 7-9:
135+
136+ * Put any other MCA parameters here.
137+ * It's ok for these MCA parameters to be esoteric and only relevant
138+ to deep magic / the internals of Open MPI.
139+ * There is little expectation of users using these MCA parameters.
140+
141+ See :ref: `this section <label-running-setting-mca-param-values >` for
142+ details on how to set MCA parameters at run time.
0 commit comments