|
| 1 | +.. only:: html |
| 2 | + |
| 3 | + .. contents:: |
| 4 | + |
| 5 | +==================== |
| 6 | +Ensemble parallelism |
| 7 | +==================== |
| 8 | + |
| 9 | +Ensemble parallelism means solving simultaneous copies of a model |
| 10 | +with different coefficients, right hand sides, or initial data, in |
| 11 | +situations that require communication between the copies. Use cases |
| 12 | +include ensemble data assimilation, uncertainty quantification, and |
| 13 | +time parallelism. This manual section assumes some familiarity with |
| 14 | +parallel programming with MPI. |
| 15 | + |
| 16 | +The Ensemble communicator |
| 17 | +========================= |
| 18 | + |
| 19 | +In ensemble parallelism, we split the MPI communicator into a number |
| 20 | +of spatial subcommunicators, each of which we refer to as an |
| 21 | +ensemble member (shown in blue in the figure below). Within each |
| 22 | +ensemble member, existing Firedrake functionality allows us to specify |
| 23 | +the finite element problems which use spatial parallelism across the spatial |
| 24 | +subcommunicator in the usual way. Another set of |
| 25 | +subcommunicators - the ensemble subcommunicators - then allow |
| 26 | +communication between ensemble members (shown in grey in the figure |
| 27 | +below). Together, the spatial and ensemble subcommunicators form a |
| 28 | +Cartesian product over the original global communicator. |
| 29 | + |
| 30 | +.. figure:: images/ensemble.svg |
| 31 | + :align: center |
| 32 | + |
| 33 | + Spatial and ensemble parallelism for an ensemble with 5 members, |
| 34 | + each of which is executed in parallel over 5 processors. |
| 35 | + |
| 36 | +The additional functionality required to support ensemble parallelism |
| 37 | +is the ability to send instances of :class:`~.Function` from one |
| 38 | +ensemble to another. This is handled by the :class:`~.Ensemble` class. |
| 39 | + |
| 40 | +Each ensemble member must have the same spatial parallel domain decomposition, so |
| 41 | +instantiating an :class:`~.Ensemble` requires a communicator to split |
| 42 | +(usually, but not necessarily, ``MPI_COMM_WORLD``) plus the number of |
| 43 | +MPI processes to be used in each member of the ensemble (5 in the |
| 44 | +figure above, and 2 in the example code below). The number of ensemble |
| 45 | +members is implicitly calculated by dividing the size of the original |
| 46 | +communicator by the number processes in each ensemble member. The |
| 47 | +total number of processes launched by ``mpiexec`` must therefore be |
| 48 | +equal to the product of the number of ensemble members with the number of |
| 49 | +processes to be used for each ensemble member, and an exception will be |
| 50 | +raised if this is not the case. |
| 51 | + |
| 52 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 53 | + :language: python3 |
| 54 | + :dedent: |
| 55 | + :start-after: [test_ensemble_manual_example 1 >] |
| 56 | + :end-before: [test_ensemble_manual_example 1 <] |
| 57 | + |
| 58 | +Then, the spatial sub-communicator ``Ensemble.comm`` must be passed |
| 59 | +to :func:`~.mesh.Mesh` (possibly via inbuilt mesh generators in |
| 60 | +:mod:`~.utility_meshes`), so that it will then be used by any |
| 61 | +:func:`~.FunctionSpace` and :class:`~.Function` derived from the mesh. |
| 62 | + |
| 63 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 64 | + :language: python3 |
| 65 | + :dedent: |
| 66 | + :start-after: [test_ensemble_manual_example 2 >] |
| 67 | + :end-before: [test_ensemble_manual_example 2 <] |
| 68 | + |
| 69 | +The ensemble sub-communicator is then available through the attribute |
| 70 | +``Ensemble.ensemble_comm``. |
| 71 | + |
| 72 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 73 | + :language: python3 |
| 74 | + :dedent: |
| 75 | + :start-after: [test_ensemble_manual_example 3 >] |
| 76 | + :end-before: [test_ensemble_manual_example 3 <] |
| 77 | + |
| 78 | +MPI communications across the spatial sub-communicator (i.e., within |
| 79 | +an ensemble member) are handled automatically by Firedrake, whilst MPI |
| 80 | +communications across the ensemble sub-communicator (i.e., between ensemble |
| 81 | +members) are handled through methods of :class:`~.Ensemble`. Currently |
| 82 | +send/recv, reductions and broadcasts are supported, as well as their |
| 83 | +non-blocking variants. |
| 84 | +The rank of the the ensemble member (``my_ensemble.ensemble_comm.rank``) |
| 85 | +and the number of ensemble members (``my_ensemble.ensemble_comm.rank``) |
| 86 | +can be accessed via the ``ensemble_rank`` and ``ensemble_size`` attributes. |
| 87 | + |
| 88 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 89 | + :language: python3 |
| 90 | + :dedent: |
| 91 | + :start-after: [test_ensemble_manual_example 4 >] |
| 92 | + :end-before: [test_ensemble_manual_example 4 <] |
| 93 | + |
| 94 | +.. warning:: |
| 95 | + |
| 96 | + In the ``Ensemble`` communication methods, each rank sends data |
| 97 | + only across the ``ensemble_comm`` that it is a part of. This |
| 98 | + assumes not only that the total mesh is identical on each ensemble |
| 99 | + member, but also that the ``ensemble_comm`` connects identical |
| 100 | + parts of the mesh on each ensemble member. Because of this, the |
| 101 | + spatial partitioning of the mesh on each ``Ensemble.comm`` must be |
| 102 | + identical. |
| 103 | + |
| 104 | + |
| 105 | +EnsembleFunction and EnsembleFunctionSpace |
| 106 | +========================================== |
| 107 | + |
| 108 | +A :class:`~.Function` is logically collective over a single spatial |
| 109 | +communicator ``Ensemble.comm``. However, for some applications we want |
| 110 | +to treat multiple :class:`~.Function` instances on different ensemble |
| 111 | +members as a single collective object over the entire global |
| 112 | +communicator ``Ensemble.global_comm``. For example, in time-parallel |
| 113 | +methods we may have a :class:`~.Function` for each timestep in a |
| 114 | +timeseries, and each timestep may live on a separate ensemble member. |
| 115 | +In this case we want to treat the entire timeseries as a single |
| 116 | +object. |
| 117 | + |
| 118 | +Firedrake implements this using :class:`~.EnsembleFunctionSpace` |
| 119 | +and :class:`~.EnsembleFunction` (along with the dual objects |
| 120 | +:class:`~.EnsembleDualSpace` and :class:`~.EnsembleCofunction`). |
| 121 | +The :class:`~.EnsembleFunctionSpace` can be thought of as a mixed |
| 122 | +function space which is parallelised across the `components`, as |
| 123 | +opposed to just being parallelised in `space`, as would usually be the |
| 124 | +case with :func:`~.FunctionSpace`. Each component of an |
| 125 | +:class:`~.EnsembleFunctionSpace` is a Firedrake :func:`~.FunctionSpace` |
| 126 | +on a single spatial communicator. |
| 127 | + |
| 128 | +To create an :class:`~.EnsembleFunctionSpace` you must provide an |
| 129 | +:class:`~.Ensemble` and, on each spatial communicator, a list of |
| 130 | +:func:`~.FunctionSpace` instances for the components on the local |
| 131 | +``Ensemble.comm``. There can be a different number of local |
| 132 | +:func:`~.FunctionSpace` on each ``Ensemble.comm``. In the example |
| 133 | +below we create an :class:`~.EnsembleFunctionSpace` with two |
| 134 | +components on the first ensemble member, and three components on |
| 135 | +every other ensemble member. Note that, unlike a |
| 136 | +:func:`~.FunctionSpace`, a component of an |
| 137 | +:class:`~.EnsembleFunctionSpace` may itself be a |
| 138 | +:func:`~.MixedFunctionSpace`. |
| 139 | + |
| 140 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 141 | + :language: python3 |
| 142 | + :dedent: |
| 143 | + :start-after: [test_ensemble_manual_example 5 >] |
| 144 | + :end-before: [test_ensemble_manual_example 5 <] |
| 145 | + |
| 146 | +Analogously to accessing the components of a :func:`~.MixedFunctionSpace` |
| 147 | +using ``subspaces``, the :func:`~.FunctionSpace` for each local component |
| 148 | +of an :class:`~.EnsembleFunctionSpace` can be accessed via |
| 149 | +``EnsembleFunctionSpace.local_spaces``. Various other methods and |
| 150 | +properties such as ``dual`` (to create an :class:`~.EnsembleDualSpace`) |
| 151 | +and ``nglobal_spaces`` (total number of components across all ranks) |
| 152 | +are also available. |
| 153 | + |
| 154 | +An :class:`~.EnsembleFunction` and :class:`~.EnsembleCofunction` can be |
| 155 | +created from the :class:`~.EnsembleFunctionSpace`. These have a ``subfunctions`` |
| 156 | +property that can be used to access the components on the local ensemble |
| 157 | +member. Each element in ``EnsembleFunction.subfunctions`` is itself just a |
| 158 | +normal Firedrake :class:`~.Function`. If a component of the |
| 159 | +``EnsembleFunctionSpace`` is a ``MixedFunctionSpace``, then the corresponding |
| 160 | +component in ``EnsembleFunction.subfunctions`` will be a mixed ``Function`` in |
| 161 | +that ``MixedFunctionSpace``. |
| 162 | + |
| 163 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 164 | + :language: python3 |
| 165 | + :dedent: |
| 166 | + :start-after: [test_ensemble_manual_example 6 >] |
| 167 | + :end-before: [test_ensemble_manual_example 6 <] |
| 168 | + |
| 169 | +:class:`~.EnsembleFunction` and :class:`~.EnsembleCofunction` have |
| 170 | +a range of methods equivalent to those of :class:`~.Function` and |
| 171 | +:class:`~.Cofunction`, such as ``assign``, ``zero``, |
| 172 | +``riesz_representation``, arithmetic operators e.g. ``+``, ``+=``, |
| 173 | +etc. These act component-wise on each local component. |
| 174 | + |
| 175 | +Because the components in ``EnsembleFunction.subfunctions`` |
| 176 | +(``EnsembleCofunction.subfunctions``) are just :class:`~.Function` |
| 177 | +(:class:`~.Cofunction`) instances, they can be used directly |
| 178 | +with variational forms and solvers. In the example code below, |
| 179 | +We create a :class:`~.LinearVariationalSolver` where the right |
| 180 | +hand side is a component of an :class:`~.EnsembleCofunction`, |
| 181 | +and the solution is written into a component of an |
| 182 | +:class:`~.EnsembleFunction`. Using the ``subfunctions`` |
| 183 | +directly like this can simplify ensemble code and reduce |
| 184 | +unnecessary copies. |
| 185 | +Note that the ``options_prefix`` is set using both the local ensemble |
| 186 | +rank and the index of the local space, which means that separate |
| 187 | +PETSc parameters can be passed from the command line to the solver |
| 188 | +on each ensemble member. |
| 189 | + |
| 190 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 191 | + :language: python3 |
| 192 | + :dedent: |
| 193 | + :start-after: [test_ensemble_manual_example 7 >] |
| 194 | + :end-before: [test_ensemble_manual_example 7 <] |
| 195 | + |
| 196 | +.. warning:: |
| 197 | + |
| 198 | + Although the ``Function`` (``Cofunction``) instances in |
| 199 | + ``EnsembleFunction.subfunctions`` (``EnsembleCofunction.subfunctions``) |
| 200 | + can be used in UFL expressions, ``EnsembleFunction`` and |
| 201 | + ``EnsembleCofunction`` themselves do not carry any symbolic |
| 202 | + information so cannot be used in UFL expressions. |
| 203 | + |
| 204 | +Internally, the :class:`~.EnsembleFunction` creates a ``PETSc.Vec`` |
| 205 | +on the ``Ensemble.global_comm`` which contains the data for all |
| 206 | +local components on all ensemble members. This ``Vec`` can be accessed |
| 207 | +with a context manager, similarly to the ``Function.dat.vec`` context |
| 208 | +managers used to access :class:`~.Function` data. There are also |
| 209 | +analogous ``vec_ro`` and ``vec_wo`` context managers for read/write |
| 210 | +only accesses. However note that, unlike the ``Function.dat.vec`` |
| 211 | +context managers, the ``EnsembleFunction.vec`` context managers |
| 212 | +need braces i.e. ``vec()`` not ``vec``. |
| 213 | + |
| 214 | +.. literalinclude:: ../../tests/firedrake/ensemble/test_ensemble_manual.py |
| 215 | + :language: python3 |
| 216 | + :dedent: |
| 217 | + :start-after: [test_ensemble_manual_example 8 >] |
| 218 | + :end-before: [test_ensemble_manual_example 8 <] |
0 commit comments