Skip to content

Install and Run on Empire AI ‐ Alpha System

Cameron Smith edited this page Jul 18, 2025 · 4 revisions

The following instructions will install the reconClassifier and its dependencies on the Empire AI alpha system: https://empireai.freshdesk.com/support/solutions

Initial Install and Test

mkdir mlReconnection
cd mlReconnection

## allocate a bunch of cores for the python install
salloc -N 1 -n 30 -p rpi -A rpi -t120 # don't need a gpu for the install

## create python install script
cat << EOF > installPython.sh
wget https://www.python.org/ftp/python/3.13.2/Python-3.13.2.tgz
tar xf Python-3.13.2.tgz
cd Python-3.13.2
./configure --prefix=$PWD/install
make -j 30 install
EOF

chmod +x installPython.sh 
./installPython.sh 

## test the install
export PATH=$PATH:$root/Python-3.13.2/install/bin/
echo "import ctypes" | python3.13 -- #should exit silently

## create pgkyl install script
cat << EOF > installPgkyl.sh
python3.13 -m venv pgkyl
source pgkyl/bin/activate
git clone https://github.com/ammarhakim/postgkyl.git
cd postgkyl/
pip install -e .[adios,test]
EOF

chmod +x installPgkyl.sh 
./installPgkyl.sh 

## clone pgkylFrontEnd
git clone -b cws/scorec git@github.com:scorec/pgkylFrontEnd.git

## create an environment file for python
cat << EOF > envPython.sh
root=$PWD/mlReconnection
export PATH=$PATH:$root/Python-3.13.2/install/bin/
export PYTHONPATH=$PYTHONPATH:$root/pgkylFrontEnd
source $root/pgkyl/bin/activate
EOF

source envPython.sh 

## install pytorch packages
pip install torch
pip install torchvision
pip install --upgrade pip

## clone and test reconClassifier
git clone -b cws/rotateAndReflect git@github.com:SCOREC/reconClassifier
data=/mnt/lustre/rpi/smithc11/spaceWeatherNsfCssi/mlReconnection/1024Res_v0
python reconClassifier/XPointMLTest.py \
--paramFile=$data/pkpm_2d_turb_p2-params.txt \
--xptCacheDir=$data/cache \
--trainFrameFirst 1 --trainFrameLast 2 --validationFrameFirst 2 --validationFrameLast 3 --epochs 2 --minTrainingLoss 0

Running

Batch - good for long training runs

Create the job script

cat << EOF > trainingJob.sh
#!/bin/bash
cd $HOME/mlReconnection
source envPython.sh 

data=/mnt/lustre/rpi/smithc11/spaceWeatherNsfCssi/mlReconnection/1024Res_v0
python reconClassifier/XPointMLTest.py \
--paramFile=$data/pkpm_2d_turb_p2-params.txt \
--xptCacheDir=$data/cache \
--epochs 2000 --plot
EOF

chmod +x trainingJob.sh

Submit the job:

sbatch -N 1 -n 8 -p rpi -A  rpi -t360 --gpus-per-node=1 ./trainingJob.sh

monitor your job

watch -n 60 squeue -u $USER

Interactive - good for development and testing

salloc -N 1 -n 1 -p rpi -A  rpi -t360 --gpus-per-node=1
# wait for job
cd mlReconnection
source envPython.sh 

data=/mnt/lustre/rpi/smithc11/spaceWeatherNsfCssi/mlReconnection/1024Res_v0
python reconClassifier/XPointMLTest.py \
--paramFile=$data/pkpm_2d_turb_p2-params.txt \
--xptCacheDir=$data/cache \
--epochs 2000 --plot

Clone this wiki locally