There is a channel #ivi_cluster in the IvI slack.
Read the pinned posts for more info, this readme only provides the basics.
ivi-h0.science.uva.nl to access the cluster. Use your UvAnetID as credentials.
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-8.0/lib64
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-8.0/cudnn/cuda/lib64
alias mywatch='/home/dkoelma1/bin/mywatch'Pick the version you want for CUDA by looking at what's available in /usr/local/.
When submitting a job, you need to specify the number of GPU, the ram, the number of CPUs and the duration. Please adjust these arguments according to your needs.
Each node has 4 GPUs, 128GB of ram, 48 CPU threads (but 2 are used for slurm), and 2x10 TB local HDD (/hddstore). Maximum run time is 7 days.
Here is an example on how to get an interactive session with 1 GPU for 2h30:
srun -u --pty --gres=gpu:1 --mem=30G --cpus-per-task=10 --time=2:30:00 -D `pwd` bash -i
Same but with 4 GPUs for 1 day and 8 hours:
srun -u --pty --gres=gpu:4 --mem=120G --cpus-per-task=40 --time=1-8 -D `pwd` bash -i
Ideally you should submit a job instead of using an interactive session:
srun --gres=gpu:1 --mem=30G --cpus-per-task=10 --time=2:30:00 -D `pwd` python myscript.py --myargument=foo
Without the -D `pwd` argument, slurm will start the job in the /tmp directory.
Nodes can also be specified, e.g., to get ivi-cn001 add the following argument -w ivi-cn001.
Quoting a wise man. Priority of a job will depend on the size of the job (smaller jobs have higher priority) and the amount of resources used in the past (if you have consumed less resources in the past you will have a higher priority).
slurm will provide a job id. Use that id if you want to remove yourself from the queue with scancel [job id].
Either use squeue or mywatch (see last line of .bashrc)
Additionally, use sinfo -h -N -o "%12n %8O %11T" to monitor CPU usage.
- Get an interactive session on a GPU node.
- Run
jupyter notebook --no-browser --port=20105on GPU node. And you'll get a token like thishttp://localhost:20105/?token=31427b811f3f3fdaef9d5da5cce8c81ba4df2f2ed4535960. - Run
ssh -L 20103:localhost:20104 UvAnetID@ivi-h0.science.uva.nl ssh -L 20104:localhost:20105 ivi-cn009on your local machine. Here you project the local port20103to IvI port20104and project IvI port20104to GPU node port20105. - Open your browser and go to http://localhost:20103 and paste the token in step 2
31427b811f3f3fdaef9d5da5cce8c81ba4df2f2ed4535960.
1 TB on your home directory.
Move your data to /hddstore or /sddstore for faster compute.
The default GCC version is 4.8.
Use GCC 6 with
source /opt/rh/devtoolset-6/enable
or GCC 7 with
source /opt/rh/devtoolset-7/enable