-
Notifications
You must be signed in to change notification settings - Fork 111
Improving performance using MPI #194
Description
Approximating π using parallelisation
Introduction
This exercise builds on #185. It is part of a series that looks at execution time of different ways to calculate π using the same Monte Carlo approach. In this approach, π is approximated by sampling n random points inside a square with side 1, computing the proportion of those points that fall inside the unit circle, and multiplying that by 4/n.
This exercise uses the Message Passing Interface (MPI) to accomplish this approximation of π. The code is already written, and you can find it in calc_pi_mpi.py on the week10 branch of this repository. Your job is to install MPI, and measure how much time it takes to complete in comparison to #185.
MPI
MPI allows parallelisation of computation. An MPI program consists of multiple processes, existing within a group called a communicator. The default communicator contains all available processes and is called MPI_COMM_WORLD.
Each process has its own rank and can execute different code. A typical way of using MPI is to divide the computation into smaller chunks, have each process deal with a chunk, and
have one "main" process to coordinate this and gather all the results. The processes can communicate with each other in pre-determined ways as specified by the MPI protocol -- for example, sending and receiving data to a particular process, or broadcasting a message to all processes.
Preparation
We are going to run the original (non-numpy) version in parallel, and compare it to the non-parallel version.
We will be using mpi4py, a Python library that gives us access to MPI functionality.
Install mpi4py using conda:
conda install mpi4py -c conda-forgeor pip:
pip install mpi4pyOn windows you will also need to install MS MPI
The MPI version of the code is available at calc_pi_mpi.py. Look at the file and try to identify what it is doing -- it's fine if you don't understand all the details! Can you see how the concepts in the brief description of MPI above are reflected in the code?
Execution
- Run the MPI version as:
The
mpiexec -n 4 python calc_pi_mpi.py
-nargument controls how many processes you start. - Increase the number of points and proceses, and compare the time it takes against the normal version. Note that to pass arguments to the python file (like
-npbelow), we have to give those after the file name.Tip: To avoid waiting for a long time, reduce the number of repetitions and iterations ofmpiexec -n 4 python calc_pi_mpi.py -np 10_000_000 python calc_pi.py -np 10_000_000 -n 1 -r 1
timeit(1 and 1 in this example) - Think of these questions:
- Is the MPI-based implementation faster than the basic one?
- Is it faster than the
numpy-based implementation? - When (for what programs or what settings) might it be faster/slower?
- How different is this version to the original? How easy is it to adapt to using MPI?
