@@ -80,15 +80,78 @@ running Open MPI's ``configure`` script.
8080
8181.. _label-install-packagers-dso-or-not :
8282
83- Components ("plugins"): DSO or no ?
84- ----------------------------------
83+ Components ("plugins"): static or DSO ?
84+ --------------------------------------
8585
8686Open MPI contains a large number of components (sometimes called
8787"plugins") to effect different types of functionality in MPI. For
8888example, some components effect Open MPI's networking functionality:
8989they may link against specialized libraries to provide
9090highly-optimized network access.
9191
92+ Open MPI can build its components as Dynamic Shared Objects (DSOs) or
93+ statically included in core libraries (regardless of whether those
94+ libraries are built as shared or static libraries).
95+
96+ .. note :: As of Open MPI |ompi_ver|, ``configure``'s global default is
97+ to build all components as static (i.e., part of the Open
98+ MPI core libraries, not as DSOs). Prior to Open MPI v5.0.0,
99+ the global default behavior was to build most components as
100+ DSOs.
101+
102+ Why build components as DSOs?
103+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104+
105+ There are advantages to building components as DSOs:
106+
107+ * Open MPI's core libraries |mdash | and therefore MPI applications
108+ |mdash | will have very few dependencies. For example, if you build
109+ Open MPI with support for a specific network stack, the libraries in
110+ that network stack will be dependencies of the DSOs, not Open MPI's
111+ core libraries (or MPI applications).
112+
113+ * Removing Open MPI functionality that you do not want is as simple as
114+ removing a DSO from ``$libdir/open-mpi ``.
115+
116+ Why build components as part of Open MPI's core libraries?
117+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
118+
119+ The biggest advantage to building the components as part of Open MPI's
120+ core libraries is when running at (very) large scales when Open MPI is
121+ installed on a network filesystem (vs. being installed on a local
122+ filesystem).
123+
124+ For example, consider launching a single MPI process on each of 1,000
125+ nodes. In this scenario, the following is accessed from the network
126+ filesystem:
127+
128+ #. The MPI application
129+ #. The core Open MPI libraries and their dependencies (e.g.,
130+ ``libmpi ``)
131+
132+ * Depending on your configuration, this is probably on the order of
133+ 10-20 library files.
134+
135+ #. All DSO component files and their dependencies
136+
137+ * Depending on your configuration, this can be 200+ component
138+ files.
139+
140+ If all components are physically located in the libraries, then the
141+ third step loads zero DSO component files. When using a networked
142+ filesystem while launching at scale, this can translate to large
143+ performance savings.
144+
145+ .. note :: If not using a networked filesystem, or if not launching at
146+ scale, loading a large number of DSO files may not consume a
147+ noticeable amount of time during MPI process launch. Put
148+ simply: loading DSOs as indvidual files generally only
149+ matters when using a networked filesystem while launching at
150+ scale.
151+
152+ Direct controls for building components as DSOs or not
153+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
154+
92155Open MPI |ompi_ver | has two ``configure ``-time defaults regarding the
93156treatment of components that may be of interest to packagers:
94157
@@ -135,19 +198,121 @@ using ``--enable-mca-dso`` to selectively build some components as
135198DSOs and leave the others included in their respective Open MPI
136199libraries.
137200
201+ :ref: `See the section on building accelerator support
202+ <label-install-packagers-building-accelerator-support-as-dsos>` for a
203+ practical example where this can be useful.
204+
205+ .. _label-install-packagers-gnu-libtool-dependency-flattening :
206+
207+ GNU Libtool dependency flattening
208+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
209+
210+ When compiling Open MPI's components statically as part of Open MPI's
211+ core libraries, `GNU Libtool <https://www.gnu.org/software/libtool/ >`_
212+ |mdash | which is used as part of Open MPI's build system |mdash | will
213+ attempt to "flatten" dependencies.
214+
215+ For example, the :ref: `ompi_info(1) <man1-ompi_info >` command links
216+ against the Open MPI core library ``libopen-pal ``. This library will
217+ have dependencies on various HPC-class network stack libraries. For
218+ simplicity, the discussion below assumes that Open MPI was built with
219+ support for `Libfabric <https://libfabric.org/ >`_ and `UCX
220+ <https://openucx.org/> `_, and therefore ``libopen-pal `` has direct
221+ dependencies on ``libfabric `` and ``libucx ``.
222+
223+ In this scenario, GNU Libtool will automatically attempt to "flatten"
224+ these dependencies by linking :ref: `ompi_info(1) <man1-ompi_info >`
225+ directly to ``libfabric `` and ``libucx `` (vs. letting ``libopen-pal ``
226+ pull the dependencies in at run time).
227+
228+ * In some environments (e.g., Ubuntu 22.04), the compiler and/or
229+ linker will automatically utilize the linker CLI flag
230+ ``-Wl,--as-needed ``, which will effectively cause these dependencies
231+ to *not * be flattened: :ref: `ompi_info(1) <man1-ompi_info >` will
232+ *not * have a direct dependencies on either ``libfabric `` or
233+ ``libucx ``.
234+
235+ * In other environments (e.g., Fedora 38), the compiler and linker
236+ will *not * utilize the ``-Wl,--as-needed `` linker CLI flag. As
237+ such, :ref: `ompi_info(1) <man1-ompi_info >` will show direct
238+ dependencies on ``libfabric `` and ``libucx ``.
239+
240+ **Just to be clear: ** these flattened dependencies *are not a
241+ problem *. Open MPI will function correctly with or without the
242+ flattened dependencies. There is no performance impact associated
243+ with having |mdash | or not having |mdash | the flattened dependencies.
244+ We mention this situation here in the documentation simply because it
245+ surprised some Open MPI downstream package managers to see that
246+ :ref: `ompi_info(1) <man1-ompi_info >` in Open MPI |ompi_ver | had more
247+ shared library dependencies than it did in prior Open MPI releases.
248+
249+ If packagers want :ref: `ompi_info(1) <man1-ompi_info >` to not have
250+ these flattened dependencies, use either of the following mechanisms:
251+
252+ #. Use ``--enable-mca-dso `` to force all components to be built as
253+ DSOs (this was actually the default behavior before Open MPI v5.0.0).
254+
255+ #. Add ``LDFLAGS=-Wl,--as-needed `` to the ``configure `` command line
256+ when building Open MPI.
257+
258+ .. note :: The Open MPI community specifically chose not to
259+ automatically utilize this linker flag for the following
260+ reasons:
261+
262+ #. Having the flattened dependencies does not cause any
263+ correctness or performance problems.
264+ #. There's multiple mechanisms (see above) for users or
265+ packagers to change this behavior, if desired.
266+ #. Certain environments have chosen to have |mdash | or
267+ not have |mdash | this flattened dependency behavior.
268+ It is not Open MPI's place to override these choices.
269+ #. In general, Open MPI's ``configure `` script only
270+ utilizes compiler and linker flags if they are
271+ *needed *. All other flags should be the user's /
272+ packager's choice.
273+
274+ .. _label-install-packagers-building-accelerator-support-as-dsos :
275+
276+ Building accelerator support as DSOs
277+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
278+
279+ If you are building a package that includes support for one or more
280+ accelerators, it may be desirable to build accelerator-related
281+ components as DSOs (see the :ref: `static or DSO?
282+ <label-install-packagers-dso-or-not>` section for details).
283+
284+ .. admonition :: Rationale
285+ :class: tip
286+
287+ Accelerator hardware is expensive, and may only be present on some
288+ compute nodes in an HPC cluster. Specifically: there may not be
289+ any accelerator hardware on "head" or compile nodes in an HPC
290+ cluster. As such, invoking Open MPI commands on a "head" node with
291+ an MPI that was built with static accelerator support but no
292+ accelerator hardware may fail to launch because of run-time linker
293+ issues (because the accelerator hardware support libraries are
294+ likely not present).
295+
296+ Building Open MPI's accelerator-related components as DSOs allows
297+ Open MPI to *try * opening the accelerator components, but proceed
298+ if those DSOs fail to open due to the lack of support libraries.
299+
300+ Use the ``--enable-mca-dso `` command line parameter to Open MPI's
301+ ``configure `` command can allow packagers to build all
302+ accelerator-related components as DSO. For example:
303+
138304.. code :: sh
139305
140- # Build all the " accelerator" components as DSOs (all other
306+ # Build all the accelerator-related components as DSOs (all other
141307 # components will default to being built in their respective
142308 # libraries)
143- shell$ ./configure --enable-mca-dso=accelerator ...
144-
145- This allows packaging ``$libdir `` as part of the "main" Open MPI
146- binary package, but then packaging
147- ``$libdir/openmpi/mca_accelerator_*.so `` as sub-packages. These
148- sub-packages may inherit dependencies on the CUDA and/or ROCM
149- packages, for example. User can always install the "main" Open MPI
150- binary package, and can install the additional "accelerator" Open MPI
151- binary sub-package if they actually have accelerator hardware
152- installed (which will cause the installation of additional
153- dependencies).
309+ shell$ ./configure --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator
310+
311+ Per the example above, this allows packaging ``$libdir `` as part of
312+ the "main" Open MPI binary package, but then packaging
313+ ``$libdir/openmpi/mca_accelerator_*.so `` and the other named
314+ components as sub-packages. These sub-packages may inherit
315+ dependencies on the CUDA and/or ROCM packages, for example. The
316+ "main" package can be installed on all nodes, and the
317+ accelerator-specific subpackage can be installed on only the nodes
318+ with accelerator hardware and support libraries.
0 commit comments