-
PhytoOracle is designed for distributed scaling on Cloud platforms and High-Performance Computers. The minimum requirements being:
- One Master instance with the required data staged that will broadcast jobs
- One or more instances that will launch Worker_Factories that will connect to the Master
-
Required Software
- Click here for Atmosphere image that comes with recommended CCTools (7.0.21) and Singularity (7.0.21) version installed.
Here are instructions for installation on Jetsream and other clouds.
-
You can install the dependency for compile from source (Ubuntu 18.04 LTS) here:
-
These commands will compile and install cctools (version 7.0.21) to
/usr/bin, so that they are in the$PATH.
wget http://ccl.cse.nd.edu/software/files/cctools-7.0.21-source.tar.gz
tar -xvf cctools-7.0.21-source.tar.gz
cd cctools-release-7.0.21
./configure --prefix /usr
make -j$(nproc)
sudo make install- Install dependencies for singularity
sudo apt-get update && sudo apt-get install -y \
build-essential \
libssl-dev \
uuid-dev \
libgpgme11-dev \
squashfs-tools \
libseccomp-dev \
wget \
pkg-config \
git \
cryptsetup
wget https://dl.google.com/go/go1.13.5.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.13.5.linux-amd64.tar.gz
echo "export PATH=\$PATH:/usr/local/go/bin" | sudo tee -a /etc/profile
export PATH=$PATH:/usr/local/go/bin- Build singularity
wget https://github.com/sylabs/singularity/releases/download/v3.5.1/singularity-3.5.1.tar.gz
tar -xvf singularity-3.5.1.tar.gz
cd singularity
./mconfig && \
make -C builddir && \
sudo make -C builddir install- Git Clone the PhytoOracle github repository.
git clone https://github.com/uacic/PhytoOracle
cd PhytoOracle
git checkout dev- Download test data (tarball), and decompress it
iinit # Enter your iRODS credentials
cd stereoTop
iget -K /iplant/home/shared/iplantcollaborative/example_data/starTerra/2018-05-15_5sets.tar
tar -xvf 2018-05-15_5sets.tarNote: you can also get the data via other methods, as along as the data is in this directory (
PhytoOracle/stereoTop), and follows the same folder structure.
- Hosting data on a HTTP Server (Nginx)
Why host this server? :
sudo apt-get install nginx apache2-utils
wget https://raw.githubusercontent.com/uacic/PhytoOracle/dev/phyto_oracle.conf
sudo mv phyto_oracle.conf /etc/nginx/sites-available/phyto_oracle.conf
sudo ln -s /etc/nginx/sites-available/phyto_oracle.conf /etc/nginx/sites-enabled/phyto_oracle.conf
sudo rm /etc/nginx/sites-enabled/default
sudo nginx -s reload- Set username and password for the HTTP file server
sudo htpasswd -c /etc/apache2/.htpasswd YOUR_USERNAME # Set password- In the file
/etc/nginx/sites-available/phyto_oracle.conf, change the line (~line 21) to the destination path to where the data is to be decompressed, e.g./home/uacic/PhytoOracle/stereoTop
root /scratch/www;
- Change permissions of the data to allow serving by the HTTP server
sudo chmod -R +r 2018-05-15/
sudo chmod +x 2018-05-15/*- Change URL inside
main_wf.php(~line 30) to the IP address or URL of the Master VM instance with HTTP server
URL needs to have slash at the end
$DATA_BASE_URL = "http://vm142-80.cyverse.org/";- Change username and password inside
process_one_set.sh(~line 27) to the ones that you set above
HTTP_USER="YOUR_USERNAME"
HTTP_PASSWORD="PhytoOracle"- Generate a list of the input raw-data files
raw_data_files.jxfrom a local path as below
python3 gen_files_list.py 2018-05-15/ > raw_data_files.json- Generate a
jsonworkflow using themain_wf.phpscript. Themain_wf.phpscripts parses theraw_data_files.jsonfile created above.
sudo apt-get install php-cli
php main_wf_phase1.php > main_wf_phase1.jx
jx2json main_wf_phase1.jx > main_workflow_phase1.json-r 0for 0 retry attempts if failed (it is for testing purposes only).
chmod 755 entrypoint.sh
./entrypoint.sh -r 0At this point, the Master will broadcast jobs on a catalog server and wait for Workers to connect. Note the IP ADDRESS of the VM and the PORT number on which makeflow is listening, mostly 9123. We will need it to tell the workers where to find our Master.
-
Launch one or more large instances with CCTools and Singularity installed as instructed above.
-
Connect a Worker Factory using the command as below
work_queue_factory -T local IP_ADDRESS 9123 -w 40 -W 44 --workers-per-cycle 10 -E "-b 20 --wall-time=3600" --cores=1 --memory=2000 --disk 10000 -dall -t 900| argument | description |
|---|---|
| -T local | this species the mode of execution for the factory |
| -w | min number of workers |
| -W | max number of workers |
Once the workers are spawned from the factories,you will see message as below
connected to master
- Makeflow Monitor on your Master VM
makeflow_monitor main_wf_phase1.jx.makeflowlog - Work_Queue Status to see how many workers are currently connected to the Master
work_queue_status
- Makeflow Clean up output and logs
./entrypoint.sh -c
rm -f makeflow.jx.args.*- Here is a pbs script to connect worker factories from UArizona HPC. Modify the following to add the IP_ADDRESS of your Master VM.
#!/bin/bash
#PBS -W group_list=ericlyons
#PBS -q windfall
#PBS -l select=2:ncpus=6:mem=24gb
#PBS -l place=pack:shared
#PBS -l walltime=02:00:00
#PBS -l cput=02:00:00
module load unsupported
module load ferng/glibc
module load singularity
export CCTOOLS_HOME=/home/u15/sateeshp/cctools
export PATH=${CCTOOLS_HOME}/bin:$PATH
cd /home/u15/sateeshp/
/home/u15/sateeshp/cctools/bin/work_queue_factory -T local IP_ADDRESS 9123 -w 80 -W 200 --workers-per-cycle 10 -E "-b 20 --wall-time=3600" --cores=1 -t 900