Skip to content

Commit abc0f0e

Browse files
authored
Merge pull request #190 from python-adaptive/mpi4py_support
add support for mpi4py as an executor.
2 parents 01fb120 + bb319d9 commit abc0f0e

File tree

2 files changed

+71
-3
lines changed

2 files changed

+71
-3
lines changed

adaptive/runner.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@
2626
except ModuleNotFoundError:
2727
with_distributed = False
2828

29+
try:
30+
import mpi4py.futures
31+
with_mpi4py = True
32+
except ModuleNotFoundError:
33+
with_mpi4py = False
34+
2935
with suppress(ModuleNotFoundError):
3036
import uvloop
3137
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
@@ -66,7 +72,7 @@ class BaseRunner(metaclass=abc.ABCMeta):
6672
the learner as its sole argument, and return True when we should
6773
stop requesting more points.
6874
executor : `concurrent.futures.Executor`, `distributed.Client`,\
69-
or `ipyparallel.Client`, optional
75+
`mpi4py.futures.MPIPoolExecutor`, or `ipyparallel.Client`, optional
7076
The executor in which to evaluate the function to be learned.
7177
If not provided, a new `~concurrent.futures.ProcessPoolExecutor`
7278
is used on Unix systems while on Windows a `distributed.Client`
@@ -281,7 +287,7 @@ class BlockingRunner(BaseRunner):
281287
the learner as its sole argument, and return True when we should
282288
stop requesting more points.
283289
executor : `concurrent.futures.Executor`, `distributed.Client`,\
284-
or `ipyparallel.Client`, optional
290+
`mpi4py.futures.MPIPoolExecutor`, or `ipyparallel.Client`, optional
285291
The executor in which to evaluate the function to be learned.
286292
If not provided, a new `~concurrent.futures.ProcessPoolExecutor`
287293
is used on Unix systems while on Windows a `distributed.Client`
@@ -386,7 +392,7 @@ class AsyncRunner(BaseRunner):
386392
stop requesting more points. If not provided, the runner will run
387393
forever, or until ``self.task.cancel()`` is called.
388394
executor : `concurrent.futures.Executor`, `distributed.Client`,\
389-
or `ipyparallel.Client`, optional
395+
`mpi4py.futures.MPIPoolExecutor`, or `ipyparallel.Client`, optional
390396
The executor in which to evaluate the function to be learned.
391397
If not provided, a new `~concurrent.futures.ProcessPoolExecutor`
392398
is used on Unix systems while on Windows a `distributed.Client`
@@ -693,6 +699,9 @@ def _get_ncores(ex):
693699
return 1
694700
elif with_distributed and isinstance(ex, distributed.cfexecutor.ClientExecutor):
695701
return sum(n for n in ex._client.ncores().values())
702+
elif with_mpi4py and isinstance(ex, mpi4py.futures.MPIPoolExecutor):
703+
ex.bootup() # wait until all workers are up and running
704+
return ex._pool.size # not public API!
696705
else:
697706
raise TypeError('Cannot get number of cores for {}'
698707
.format(ex.__class__))

docs/source/tutorial/tutorial.parallelism.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,62 @@ On Windows by default `adaptive.Runner` uses a `distributed.Client`.
5353
runner = adaptive.Runner(learner, executor=client, goal=lambda l: l.loss() < 0.01)
5454
runner.live_info()
5555
runner.live_plot(update_interval=0.1)
56+
57+
`mpi4py.futures.MPIPoolExecutor`
58+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59+
60+
This makes sense if you want to run a ``Learner`` on a cluster non-interactively using a job script.
61+
62+
For example, you create the following file called ``run_learner.py``:
63+
64+
.. code:: python
65+
66+
import mpi4py.futures
67+
68+
learner = adaptive.Learner1D(f, bounds=(-1, 1))
69+
70+
# load the data
71+
learner.load(fname)
72+
73+
# run until `goal` is reached with an `MPIPoolExecutor`
74+
runner = adaptive.Runner(
75+
learner,
76+
executor=MPIPoolExecutor(),
77+
shutdown_executor=True,
78+
goal=lambda l: l.loss() < 0.01,
79+
)
80+
81+
# periodically save the data (in case the job dies)
82+
runner.start_periodic_saving(dict(fname=fname), interval=600)
83+
84+
# block until runner goal reached
85+
runner.ioloop.run_until_complete(runner.task)
86+
87+
88+
On your laptop/desktop you can run this script like:
89+
90+
.. code:: python
91+
92+
export MPI4PY_MAX_WORKERS=15
93+
mpiexec -n 1 python run_learner.py
94+
95+
Or you can pass ``max_workers=15`` programmatically when creating the executor instance.
96+
97+
Inside the job script using a job queuing system use:
98+
99+
.. code:: python
100+
101+
export MPI4PY_MAX_WORKERS=15
102+
mpiexec -n 16 python -m mpi4py.futures run_learner.py
103+
104+
How you call MPI might depend on your specific queuing system, with SLURM for example it's:
105+
106+
.. code:: python
107+
108+
#!/bin/bash
109+
#SBATCH --job-name adaptive-example
110+
#SBATCH --ntasks 100
111+
112+
export MPI4PY_MAX_WORKERS=$SLURM_NTASKS
113+
srun -n $SLURM_NTASKS --mpi=pmi2 ~/miniconda3/envs/py37_min/bin/python -m mpi4py.futures run_learner.py
114+

0 commit comments

Comments
 (0)