Skip to content

Commit 91146c5

Browse files
authored
[docs] updated parallelization into (#5097)
1 parent 79dead3 commit 91146c5

File tree

1 file changed

+35
-16
lines changed

1 file changed

+35
-16
lines changed

package/doc/sphinx/source/documentation_pages/analysis_modules.rst

Lines changed: 35 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ See the `User Guide Analysis section`_ for interactive examples and additional c
2222
Getting started with analysis
2323
=============================
2424

25+
General usage pattern
26+
---------------------
27+
2528
Most analysis tools are implemented as single classes and follow this usage pattern:
2629

2730
#. Import the module (e.g., :mod:`MDAnalysis.analysis.rms`).
@@ -40,30 +43,42 @@ Please see the individual module documentation for any specific caveats
4043
and also read and cite the reference papers associated with these algorithms.
4144

4245

43-
Using parallelization for built-in analysis runs
44-
------------------------------------------------
46+
Using parallelization for analysis tools
47+
----------------------------------------
4548

4649
.. versionadded:: 2.8.0
4750

48-
:class:`~MDAnalysis.analysis.base.AnalysisBase` subclasses can run on a backend
49-
that supports parallelization (see :mod:`MDAnalysis.analysis.backends`). All
50-
analysis runs use ``backend='serial'`` by default, i.e., they do not use
51-
parallelization by default, which has been standard before release 2.8.0 of
52-
MDAnalysis.
51+
Many analysis tools (based on :class:`~MDAnalysis.analysis.base.AnalysisBase`)
52+
can be :ref:`run in parallel <parallel-analysis>` using a simple
53+
split-apply-combine scheme whereby slices of the trajectory ("split") are analyzed in
54+
parallel ("apply" the analysis function) and the data from the parallel executions
55+
are "combined" at the end.
56+
57+
MDAnalysis supports different :ref:`backends <backends>` for the parallel execution such as
58+
:mod:`multiprocessing` or `dask`_ (see :mod:`MDAnalysis.analysis.backends`).
59+
As a special case, serial execution is handled by the default ``backend='serial'``, i.e.,
60+
by default, none of the analysis tools run in parallel and one has to explicitly request
61+
parallel execution. Without any additionally installed dependencies, only one parallel backend
62+
is supported -- Python :mod:`multiprocessing` (which is available in the Python standard
63+
library), which processes each slice of a trajectory by running a separate *process* on a
64+
different core of a multi-core CPU.
5365

54-
Without any dependencies, only one backend is supported -- built-in
55-
:mod:`multiprocessing`, that processes parts of a trajectory running separate
56-
*processes*, i.e. utilizing multi-core processors properly.
66+
.. _dask: https://dask.org/
5767

5868
.. Note::
5969

60-
For now, parallelization has only been added to
61-
:class:`MDAnalysis.analysis.rms.RMSD`, but by release 3.0 version it will be
62-
introduced to all subclasses that can support it.
70+
Not all analysis tools in MDAnalysis can be parallelized and others have
71+
not yet been updated to make use of the :ref:`parallelization framework <parallel-analysis>`,
72+
which was introduced in release 2.8.0. MDAnalysis aims to have parallelization enabled for
73+
all analysis tools that support it by release 3.0.
6374

64-
In order to use that feature, simply add ``backend='multiprocessing'`` to your
65-
run, and supply it with proper ``n_workers`` (use ``multiprocessing.cpu_count()``
66-
for maximum available on your machine):
75+
In order to use parallelization, add ``backend='multiprocessing'`` to the arguments of the
76+
:meth:`~MDAnalysis.analysis.base.AnalysisBase.run` method together with ``n_workers=N`` where
77+
``N`` is the number of CPUs that you want to use for parallelization.
78+
(You can use ``multiprocessing.cpu_count()`` to get the maximum available number of CPUs on your
79+
machine but this may not always lead to the best performance because of computational overheads and
80+
the fact that parallel access to a single trajectory file is often a performance bottleneck.) As an
81+
example we show how to run an RMSD calculation in parallel:
6782

6883
.. code-block:: python
6984
@@ -89,6 +104,10 @@ workers or using large trajectory frames may lead to an out-of-memory error.
89104

90105
You can also implement your own backends -- see :mod:`MDAnalysis.analysis.backends`.
91106

107+
.. SeeAlso::
108+
:ref:`parallel-analysis` for technical details
109+
110+
92111

93112
Additional dependencies
94113
-----------------------

0 commit comments

Comments
 (0)