Using Australia's Gadi HPC #4584
Replies: 3 comments 8 replies
-
|
This is a great resource @navidcy ! Even though I don't have access to these particular servers, this walk through will help me in setting things up on other servers. I may think about doing something similar for a cluster we have in Canada that a few research groups are using to run Oceanangigans, and I suspect more will follow. One question. Your |
Beta Was this translation helpful? Give feedback.
-
|
Hi @navidcy should we include information about bindings for multi-GPU runs here? Or In taimoorsohail/ocean-ensembles#74? |
Beta Was this translation helpful? Give feedback.
-
|
Thanks this is great! @navidcy or @taimoorsohail, have you had this error on Gadi before? Can this be solved by downgrading CUDA? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Australia's Gadi supercomputer is housed at the National Computational Infrastructure within the Australian National University's campus.
Gadi has 160 nodes each containing four Nvidia V100 GPUs and two 24-core Intel Xeon Scalable 'Cascade Lake' processors. Also it has 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node.
Gadi uses a Portable Batch System (otherwise simply know as PBS) queuing system.
[Note, this post is subject to change. Let's try to keep it up to date, please comment below if something does not work.]
Scope
This discussion can cover anything to do with trying to get results from running Oceananigans on Gadi --- including installing Julia, setting up CUDA and MPI, configuring PBS scripts, and using other Julia packages in conjunction with Oceananigans.
Links
Getting started on Gadi
It's assumed as prerequisite that you have access to Gadi and an NCI username.
The first task is to download Julia. We suggest to juliaup to install julia in one of your project's directories.
Note: Avoid installing in your home (
$HOMEor~/) directory since there is a 10GB limit of user's home directory and that can fill up quickly!Thus, to install julia
1.10.9using juliaup first create a directory to install juliaup and julia. For example, if the NCI project you are part of isxy12and your NCI username isab1234then:cd /g/data/xy12/ab1234 mkdir .juliaThis will also be the directory where julia will use to install all the packages. This directory can grow a bit in size so that's why it's appropriate to have it somewhere else outside your
$HOMEdirectory.Then we install
juliaup. We provide--pathargument to ensure installation happens in the path we just created.curl -fsSL https://install.julialang.org | sh -s -- --path /g/data/xy12/ab1234/.julia/juliaup --default-channel 1.10The installation should have modified your profile files. You might need to start a new session or source your shell startup scripts (e.g.,
.bashrc,.bash_profile) that were modified by juliaup.After doing so, Julia can be launched by typing
julia:We then need to tell Julia that its depot path is over to the
.juliadirectory we just created. (depot path is where Julia installs Julia packages, saves compiled versions of packages, etc; by default the depot would reside in$HOME/.julia), which creates issues due to size limits of$HOME. To do so, we add an environment variable in our.bash_profile:export JULIA_DEPOT_PATH=/g/data/xy12/ab1234/.juliaWe also add
Moving the depot into
g/datafurther helps when software downloads big data sets into the depot (like ClimaOcean does).Julia is now installed! 🎉
An example script
Next let's test that things work by creating a test project:
We created an empty project.
Let's use Julia's package manager to add Oceananigans in this project and instantiate it.
We can do that within the Julia's REPL or via:
Note: installing Julia package's requires internet access and on Gadi only login nodes have internet access.
Now let's create a script that uses Oceananigans and run it. Let's call this script
hello-oceananigans.jland let's include:From the login node, you should be able to run this via
julia --project hello-oceananigans.jl. This is what you should get:You just run your first Julia script on Gadi! 🎉
Submit a job via PBS
Next let's submit the same script to run via PBS.
We create a submission script, e.g. named
submit-hello-oceananigans.shthat containsThe storage flag
gdata/xy12is needed because Julia is installed there. Add more storage flags as required. Themodule purgecommand ensures that there is no other (possibly conflicting) module loaded by the user's startup files.Then we submit the PBS job
After the job runs you should have an
output.stdoutfile containingSuccess! 🎉
Run on GPU
To run the same script on a GPU you only need to modify the grid in
hello-oceananigans.jlto be constructed withGPU()argument, e.g.,and then modify also the
submit-hello-oceananigans.shscript to use thegpuvoltaqueue and also ask for at least one GPUThe 12 CPUs that were requested above is not a coincidence; Gadi's
gpuvoltaqueue requires that you request 12 CPUs per 1 GPU; see the Gadi queue limits docs.After the above modifications, submitting the GPU job will now give
output.stdoutcontainingSuccess again! Woooo! 🎉
Note the difference! The grid (and by consequence also the field) you created now live on
CUDAGPU!Run on many GPUs
We are now ready to configure Oceananigans to use multiple GPUs via CUDA-aware MPI communication. This is a bit harder to set up... But we'll do it together.
The instructions below for setting up CUDA-aware MPI on Gadi are heavily inspired from the discussion at taimoorsohail/ocean-ensembles#74 after the heroic efforts of @taimoorsohail.
We first unload all modules (just to ensure that we all start from the same page).
and load the required modules for CUDA-aware MPI configuration
We then want to ensure that the MPI versions that are called are the system defaults. To do that, we use MPIPreferences.jl package. This package identifies the MPI implementations on the machine and creates a small toml file with preferences that MPI will use.
Now we run:
$ julia -e 'using Pkg; Pkg.add("MPIPreferences"); using MPIPreferences; MPIPreferences.use_system_binary()'The above should have generated a file called
LocalPreferences.tomlat the/g/data/xy12/ab1234/.julia/environments/v1.10/directory that looks something like:Note 1: You don't need to run this step every time; this should only be done once and then the
LocalPreferences.tomllives in your general Julia environment and it's available to any other project you want to run on Gadi. You might need to rerun this step if the MPI installation on Gadi changes or upgrades what not.Note 2: With the
LocalPreferences.tomlcreated, you might start getting warnings or errors if you don't load the corresponding MPI modules on Gadi. See the updated PBS bash script below for the required modifications. You might need to loadopenmpiandcudamodules as well as define theLD_LIBRARY_PATHeven if you only wanna use a single GPU/CPU.Now let's install other packages we'll need, like MPI.jl. We can install either via
or from the Julia REPL via the package manager (which we enter by pressing
]at the REPL):Next, we ensure some more relevant environmental variables are set (consider adding them in your
.bash_profilefile).We are ready to run a script that will exercise CUDA-aware MPI communication; let's call this
hello-cuda-mpi.jland let's write a bash script to submit this through the queue with multiple GPUs
When this job runs the
output.stoutshould contains a few hellos from the various ranks and also output like:It's essential to notice that both
candufields have different mean values on each rank.There you go! You now have a CUDA-aware MPI Oceananigans configuration!! 🎉
Beta Was this translation helpful? Give feedback.
All reactions