Skip to content

Latest commit

 

History

History

README.md

Richardson-Lucy Algorithm

Directory Contents

Richardson-Lucy is abbreviated to RL.

  • datapreprocessing.py: runs a file check. If unsuccessful, the script downloads missing files from the COSIpy server and preprocesses them. input.yaml is a dependency.
  • datavisualization.ipynb: python notebook to visualize code inputs and outputs and facilitates a comparison with current COSIpy implementation.
  • input.yaml: configuration file for datapreprocessing.py.
  • RLparallel.py: main function for the parallel RL implementation. It spawns worker processes through the relevant MPI calls, the master process facilitates the transfer of intermediate vectors $\epsilon_i$ and $C_j$, and $M_j^{(k)}$, and facilitates parallel reads of the response matrix. In its current form, the observed data vector $d_i$ combines a simulated point source signal and full-sky background model, and the initial guess $M_j{(0)}$ is defined to $10^{-4}$ counts. NUMROWS and NUMCOLS need to be predefined, although a later version may support directly reading the shape of the response matrix dataset. More details on the algorithm as well as the implementation is available in the final report (code documentation).
  • data/: directory where all the input data should be ideally hosted. DATA_DIR specified in RLparallel.py points to this directory. datapreprocessing.py will automatically download and place the preprocessed files here, if it finds the directory path without any pain.
  • outputs/: output directory for results generated by RLparallel.py. This directory is populated by the final, converged signal vector upon an out-of-the-box run of RLparallel.py.
  • toymodel/: simplified, toy model implementation of the RL algorithm.

Dependencies

  • numpy
  • mpi4py
  • h5py with parallel read access enabled. It is enabled by default in the standard installation [source].
  • cosipy and histpy are required to run datapreprocessing.py. The main function does not require these libraries.

Executing on Expanse

The data preprocessing step is yet to be set up to run out-of-the-box. The user must exploit the increased read/write speeds from the lustre filesystem on Expanse by predefining the stripe length of the data directory.

The user should also modify file paths BASE_DIR and DATA_DIR in both datapreprocessing.py as well as RLparallel.py appropriately.

Required source and response files

The following files are required for,

  1. $^{44}Ti$ from supernova remnants:
    • response matrix: psr_gal_flattened_Ti44_E_1150_1164keV_DC2.h5
    • background model: total_bg_dense.hdf5
    • at least one signal: Ti44_CasA_dense.hdf5, Ti44_G1903_dense.hdf5, Ti44_SN1987A_dense.hdf5
  2. Positron annihilation:
    • response matrix: psr_gal_flattened_511_DC2.h5
    • background model: albedo_bg_dense.h5 or total_bg_dense.hdf5
    • signal: 511_thin_disk_dense.h5

More data files are available on the Wasabi server. Their download links can be found here.

Downloading the required files

Running datapreprocessing.py should at least download all the files, even if it does not sufficiently preprocess them. The code seems to suffer from the inability to load very large files onto numpy arrays. One may go through each step one-by-one to perform the appropriate data binning, flatten the multidimensional quantities, and obtain each file in the "dense" representation.

Note that datapreprocessing.py employs cosipy as a dependency. The installation page is available here. If you have pip, it can be set up very easily via,

$ module load py-pip/21.1.2         # load pip
$ pip install cosipy
$ python3 datapreprocessing.py

Moving the files to the Lustre filesystem

$ cd /expanse/lustre/scratch/$USER/temp_project     # go to the Lustre filesystem directory
$ mkdir data                                        # make a new directory for organizational purposes
$ lfs setstripe -c 16 data/                         # set the stripe size of the new directory to 16. See the lecture on DataManagement_ML_2023.pdf by Dr. Mahidar Tatineni for more information.
$ cd data                                           
$ cp ~/path/to/downloaded/data/ .                   # copy the downloaded files into the new directory

Actual execution

It is recommended to run these codes interactively,

$ cd /expanse/lustre/scratch/$USER/temp_project
$ salloc --nodes=1 --ntasks-per-node=32 --mem=64G -A csd759 -t 0:30:00 -p shared    # request for node allocation
$ module reset                                                                      # reset your environment and load the required modules. Thank you Dr. Tatineni!
$ module load gcc/10.2.0
$ module load openmpi/4.1.3
$ module load python/3.8.12
$ module load py-mpi4py/3.1.2
$ module load hdf5/1.10.7
$ module load py-numpy/1.20.3
$ module load py-h5py/3.4.0
$ mpiexec -n 16 python ~/Richardson-Lucy/code/RLparallel.py                 		# works when required preprocessed data are available (suffix “_dense.h5”) and DATA_DIR is appropriately set

Executing on a Personal Computer

To execute the pipeline on a local computer, ensure that the h5py installation in your python environment supports parallel read access. It is recommended to use a computer with at least 16 GB RAM.

$ conda activate <venv>       # activate your python environment
$ export TMPDIR=/tmp          # truncation can occur on MacOS with the default TMPDIR. I am not aware if a similar step is required in other OSes.
$ python3 datapreprocessing.py                  # to check if all file dependencies are satisfied or download them
$ mpiexec -n <numproc> python RLparallel.py     # run the code with the intended number of nodes