Using Australia's Gadi HPC #4584

navidcy · 2025-06-06T00:16:49Z

navidcy
Jun 6, 2025
Maintainer

Overview

Australia's Gadi supercomputer is housed at the National Computational Infrastructure within the Australian National University's campus.

Gadi has 160 nodes each containing four Nvidia V100 GPUs and two 24-core Intel Xeon Scalable 'Cascade Lake' processors. Also it has 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node.

Gadi uses a Portable Batch System (otherwise simply know as PBS) queuing system.

[Note, this post is subject to change. Let's try to keep it up to date, please comment below if something does not work.]

Scope

This discussion can cover anything to do with trying to get results from running Oceananigans on Gadi --- including installing Julia, setting up CUDA and MPI, configuring PBS scripts, and using other Julia packages in conjunction with Oceananigans.

Links

Gadi HPC documentation

Getting started on Gadi

It's assumed as prerequisite that you have access to Gadi and an NCI username.

The first task is to download Julia. We suggest to juliaup to install julia in one of your project's directories.

Note: Avoid installing in your home ($HOME or ~/) directory since there is a 10GB limit of user's home directory and that can fill up quickly!

Thus, to install julia 1.10.9 using juliaup first create a directory to install juliaup and julia. For example, if the NCI project you are part of is xy12 and your NCI username is ab1234 then:

cd /g/data/xy12/ab1234
mkdir .julia

This will also be the directory where julia will use to install all the packages. This directory can grow a bit in size so that's why it's appropriate to have it somewhere else outside your $HOME directory.

Then we install juliaup. We provide --path argument to ensure installation happens in the path we just created.

curl -fsSL https://install.julialang.org | sh -s -- --path /g/data/xy12/ab1234/.julia/juliaup --default-channel 1.10

The installation should have modified your profile files. You might need to start a new session or source your shell startup scripts (e.g., .bashrc, .bash_profile) that were modified by juliaup.

After doing so, Julia can be launched by typing julia:

ab1234@gadi-login-01:~$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.9 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

We then need to tell Julia that its depot path is over to the .julia directory we just created. (depot path is where Julia installs Julia packages, saves compiled versions of packages, etc; by default the depot would reside in $HOME/.julia), which creates issues due to size limits of $HOME. To do so, we add an environment variable in our .bash_profile:

export JULIA_DEPOT_PATH=/g/data/xy12/ab1234/.julia

We also add

export JULIA_LOAD_PATH="@":"@v#.#":"@stdlib":"@site"

Moving the depot into g/data further helps when software downloads big data sets into the depot (like ClimaOcean does).

Julia is now installed! 🎉

An example script

Next let's test that things work by creating a test project:

ab1234@gadi-login-01:$ mkdir ~/hello-oceananigans
ab1234@gadi-login-01:$ cd ~/hello-oceananigans
ab1234@gadi-login-01:~/hello-oceananigans$ touch Project.toml

We created an empty project.
Let's use Julia's package manager to add Oceananigans in this project and instantiate it.
We can do that within the Julia's REPL or via:

ab1234@gadi-login-01:~/hello-oceananigans$ julia --project -e 'using Pkg; Pkg.add("Oceananigans"); Pkg.instantiate()'

Note: installing Julia package's requires internet access and on Gadi only login nodes have internet access.

Now let's create a script that uses Oceananigans and run it. Let's call this script hello-oceananigans.jl and let's include:

using Oceananigans

grid = RectilinearGrid(CPU(), size=(8, 8, 8), extent=(1, 2, 3))

@info "hello from Oceananigans!"
@info "Your first Oceananigans grid:"
@show grid

c = CenterField(grid)

@info "... and your first Oceananigans field:"
@show c

From the login node, you should be able to run this via julia --project hello-oceananigans.jl. This is what you should get:

[ab1234@gadi-login-01 hello-oceananigans]$ julia --project hello-oceananigans.jl
[ Info: hello from Oceananigans!
[ Info: Your first Oceananigans grid:
grid = 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 2.0)  regularly spaced with Δy=0.25
└── Bounded  z ∈ [-3.0, 0.0] regularly spaced with Δz=0.375
[ Info: ... and your first Oceananigans field:
c = 8×8×8 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 14×14×14 OffsetArray(::Array{Float64, 3}, -2:11, -2:11, -2:11) with eltype Float64 with indices -2:11×-2:11×-2:11
    └── max=0.0, min=0.0, mean=0.0

You just run your first Julia script on Gadi! 🎉

Submit a job via PBS

Next let's submit the same script to run via PBS.

We create a submission script, e.g. named submit-hello-oceananigans.sh that contains

#!/bin/bash
#PBS -P xy12
#PBS -q normal
#PBS -l ncpus=4
#PBS -l walltime=0:30:00
#PBS -l mem=16GB
#PBS -l storage=gdata/xy12
#PBS -l wd
#PBS -l jobfs=10GB
#PBS -W umask=027
#PBS -j n
#PBS -N hello-oceananigans

module purge

julia --project hello-oceananigans.jl > output.stdout 2>&1

The storage flag gdata/xy12 is needed because Julia is installed there. Add more storage flags as required. The module purge command ensures that there is no other (possibly conflicting) module loaded by the user's startup files.

Then we submit the PBS job

[ab1234@gadi-login-01 hello-oceananigans]$ qsub submit-hello-oceananigans.sh

After the job runs you should have an output.stdout file containing

[ab1234@gadi-login-01 hello-oceananigans]$ cat output.stdout
[ Info: hello from Oceananigans!
[ Info: Your first Oceananigans grid:
[ Info: ... and your first Oceananigans field:
grid = 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 2.0)  regularly spaced with Δy=0.25
└── Bounded  z ∈ [-3.0, 0.0] regularly spaced with Δz=0.375
c = 8×8×8 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 14×14×14 OffsetArray(::Array{Float64, 3}, -2:11, -2:11, -2:11) with eltype Float64 with indices -2:11×-2:11×-2:11
    └── max=0.0, min=0.0, mean=0.0

Success! 🎉

Run on GPU

To run the same script on a GPU you only need to modify the grid in hello-oceananigans.jl to be constructed with GPU() argument, e.g.,

grid = RectilinearGrid(GPU(), size=(8, 8, 8), extent=(1, 2, 3))

and then modify also the submit-hello-oceananigans.sh script to use the gpuvolta queue and also ask for at least one GPU

#PBS -q gpuvolta
#PBS -l ngpus=1
#PBS -l ncpus=12

The 12 CPUs that were requested above is not a coincidence; Gadi's gpuvolta queue requires that you request 12 CPUs per 1 GPU; see the Gadi queue limits docs.

After the above modifications, submitting the GPU job will now give output.stdout containing

[ab1234@gadi-login-01 hello-oceananigans]$ cat output.stdout
[ Info: hello from Oceananigans!
[ Info: Your first Oceananigans grid:
[ Info: ... and your first Oceananigans field:
grid = 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CUDAGPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 2.0)  regularly spaced with Δy=0.25
└── Bounded  z ∈ [-3.0, 0.0] regularly spaced with Δz=0.375
c = 8×8×8 Field{Center, Center, Center} on RectilinearGrid on CUDAGPU
├── grid: 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CUDAGPU with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 14×14×14 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:11, -2:11) with eltype Float64 with indices -2:11×-2:11×-2:11
    └── max=0.0, min=0.0, mean=0.0

Success again! Woooo! 🎉

Note the difference! The grid (and by consequence also the field) you created now live on CUDAGPU!

Run on many GPUs

We are now ready to configure Oceananigans to use multiple GPUs via CUDA-aware MPI communication. This is a bit harder to set up... But we'll do it together.

The instructions below for setting up CUDA-aware MPI on Gadi are heavily inspired from the discussion at taimoorsohail/ocean-ensembles#74 after the heroic efforts of @taimoorsohail.

We first unload all modules (just to ensure that we all start from the same page).

module purge

and load the required modules for CUDA-aware MPI configuration

module load cuda/12.6.2
module load openmpi/4.1.7

We then want to ensure that the MPI versions that are called are the system defaults. To do that, we use MPIPreferences.jl package. This package identifies the MPI implementations on the machine and creates a small toml file with preferences that MPI will use.

Now we run:

$ julia -e 'using Pkg; Pkg.add("MPIPreferences"); using MPIPreferences; MPIPreferences.use_system_binary()'

The above should have generated a file called LocalPreferences.toml at the /g/data/xy12/ab1234/.julia/environments/v1.10/ directory that looks something like:

[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
cclibs = []
libmpi = "libmpi"
mpiexec = "mpiexec"
preloads = []

Note 1: You don't need to run this step every time; this should only be done once and then the LocalPreferences.toml lives in your general Julia environment and it's available to any other project you want to run on Gadi. You might need to rerun this step if the MPI installation on Gadi changes or upgrades what not.

Note 2: With the LocalPreferences.toml created, you might start getting warnings or errors if you don't load the corresponding MPI modules on Gadi. See the updated PBS bash script below for the required modifications. You might need to load openmpi and cuda modules as well as define the LD_LIBRARY_PATH even if you only wanna use a single GPU/CPU.

Now let's install other packages we'll need, like MPI.jl. We can install either via

ab1234@gadi-login-01:~/hello-oceananigans$ julia --project -e 'using Pkg; Pkg.add("MPI")'

or from the Julia REPL via the package manager (which we enter by pressing ] at the REPL):

ab1234@gadi-login-01:~/hello-oceananigans$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.9 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(hello-oceananigans) pkg> add MPI

Next, we ensure some more relevant environmental variables are set (consider adding them in your .bash_profile file).

export LD_LIBRARY_PATH=/apps/openmpi/4.1.7/lib
export JULIA_NUM_THREADS=1
export JULIA_CUDA_MEMORY_POOL=none 
export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE"
export UCX_WARN_UNUSED_ENV_VARS=n

We are ready to run a script that will exercise CUDA-aware MPI communication; let's call this hello-cuda-mpi.jl

using Oceananigans
using Oceananigans.Fields: interpolate!
using MPI

# Automatically distributes among available processors
arch = Distributed(GPU())

rank = arch.local_rank
Nranks = MPI.Comm_size(arch.communicator)
println("Hello from process $rank out of $Nranks")

x = y = z = (0, 1)
grid = RectilinearGrid(arch; size=(64, 64, 64), x, y, z, topology=(Periodic, Periodic, Bounded))

@info "The grid on rank $rank:"
@info "$grid"

c = CenterField(grid)
set!(c, (x, y, z) -> x * y^2 * z^3)

@info "c on rank $rank:"
@show c

u = XFaceField(grid)
set!(c, (x, y, z) -> x * y^2 * z^3)
interpolate!(u, c)

@info "u on rank $rank:"
@show u

and let's write a bash script to submit this through the queue with multiple GPUs

#!/bin/bash
#PBS -P xy12
#PBS -q gpuvolta
#PBS -l ngpus=4
#PBS -l ncpus=48
#PBS -l walltime=0:30:00
#PBS -l mem=64GB
#PBS -l storage=gdata/xy12
#PBS -l wd
#PBS -l jobfs=10GB
#PBS -W umask=027
#PBS -j n
#PBS -N hello-cuda

module purge
module load cuda/12.6.2
module load openmpi/4.1.7

export LD_LIBRARY_PATH=/apps/openmpi/4.1.7/lib
export JULIA_NUM_THREADS=1
export JULIA_CUDA_MEMORY_POOL=none 
export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE"
export UCX_WARN_UNUSED_ENV_VARS=n

mpirun -n 4 julia --project hello-cuda-mpi.jl > output.stdout 2>&1

When this job runs the output.stout should contains a few hellos from the various ranks and also output like:

(base) ab1234@gadi-login-01:~/hello-oceananigans$ cat output.stdout
Hello from process 0 out of 4
Hello from process 2 out of 4Hello from process 1 out of 4Hello from process 3 out of 4


[ Info: The grid on rank 2:
[ Info: The grid on rank 3:
[ Info: The grid on rank 1:
[ Info: The grid on rank 0:
┌ Info: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.25, 0.5) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)        regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]        regularly spaced with Δz=0.015625
┌ Info: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.75, 1.0) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)        regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]        regularly spaced with Δz=0.015625
┌ Info: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.5, 0.75) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)        regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]        regularly spaced with Δz=0.015625
┌ Info: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.0, 0.25) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)        regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]        regularly spaced with Δz=0.015625
[ Info: c on rank 2:
[ Info: c on rank 0:
[ Info: c on rank 3:
[ Info: c on rank 1:
c = 16×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.0520738
c = 16×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.0312443
c = 16×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.0729033
c = 16×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.0104148
[ Info: u on rank 1:
[ Info: u on rank 2:
[ Info: u on rank 3:
[ Info: u on rank 0:
u = 16×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.07032
u = 16×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.0501414
u = 16×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.0299628
u = 16×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}}
├── grid: 16×64×64 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Periodic, Bounded} on Distributed{GPU{CUDA.CUDAKernels.CUDABackend}} with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 22×70×70 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, -2:19, -2:67, -2:67) with eltype Float64 with indices -2:19×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.00978418

It's essential to notice that both c and u fields have different mean values on each rank.

There you go! You now have a CUDA-aware MPI Oceananigans configuration!! 🎉

francispoulin · 2025-06-06T11:21:48Z

francispoulin
Jun 6, 2025
Collaborator

This is a great resource @navidcy ! Even though I don't have access to these particular servers, this walk through will help me in setting things up on other servers. I may think about doing something similar for a cluster we have in Canada that a few research groups are using to run Oceanangigans, and I suspect more will follow.

One question. Your hello_oceananigans.jl outputs the number of threads it will use to be 48, but there was not a line in the code that did this. Is there a line missing perhaps?

4 replies

navidcy Jun 6, 2025
Maintainer Author

Good catch; that's because I have an environmental variable to set the number of threads as such (because that's the number of CPUs in a node on Gadi). I'll rerun without that and update the output.

francispoulin Jun 6, 2025
Collaborator

Very nice. Can you share with the command is? I have not done any multhreading but maybe I should start, and counting them seems like a good first step.

navidcy Jun 6, 2025
Maintainer Author

You can get that either by setting an environmental variable

export JULIA_NUM_THREADS=48

or by starting Julia with

julia --threads=48

Also

julia --threads=auto

would figure out the number of available threads on your machine automatically.

francispoulin Jun 6, 2025
Collaborator

Thanks!

taimoorsohail · 2025-11-25T05:24:24Z

taimoorsohail
Nov 25, 2025
Collaborator

Hi @navidcy should we include information about bindings for multi-GPU runs here? Or In taimoorsohail/ocean-ensembles#74?

1 reply

simone-silvestri Nov 25, 2025
Maintainer

I guess here is a good place to write it. You can always link it in other discussions. You should also link it here CliMA/ClimaOcean.jl#665

janjaapmeijer · 2026-01-08T06:33:21Z

janjaapmeijer
Jan 8, 2026

Thanks this is great! @navidcy or @taimoorsohail, have you had this error on Gadi before?

Error: Your Tesla V100-SXM2-32GB GPU (compute capability 7.0) is not supported on CUDA 13+.
Please use a device with at least capability 7.5, or downgrade your NVIDIA driver to below v580.

Can this be solved by downgrading CUDA?

3 replies

taimoorsohail Jan 8, 2026
Collaborator

Hi @janjaapmeijer! Welcome! Yes I am using CUDA 12.6. Unfortunately CUDA does not support V100 chips for v13+.

janjaapmeijer Jan 13, 2026

Hi @taimoorsohail thanks for your help.
Unfortunately even after downgrading CUDA with module load cuda/12.6.2. I am keeping the same error. I am trying to reproduce the example with running on 1 GPU, the other example running on CPUs with PBS works. Please see the output.stdout below.

Any ideas, what it could be?

Activating project at /g/data/jk72/jm6603/checkouts/southfront/src/ocean_env
┌ Warning: CUDA runtime library libcusparse.so.12 was loaded from a system path, /apps/cuda/12.6.2/lib64/libcusparse.so.12.
│ This may cause errors.
│
│ If you're running under a profiler, this situation is expected. Otherwise,
│ ensure that your library path environment variable (e.g., PATH on Windows
│ or LD_LIBRARY_PATH on Linux) does not include CUDA library paths.
│
│ In any other case, please file an issue.
└ @ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/src/initialization.jl:204
┌ Warning: CUDA runtime library libnvJitLink.so.12 was loaded from a system path, /apps/cuda/12.6.2/lib64/libnvJitLink.so.12.
│ This may cause errors.
│
│ If you're running under a profiler, this situation is expected. Otherwise,
│ ensure that your library path environment variable (e.g., PATH on Windows
│ or LD_LIBRARY_PATH on Linux) does not include CUDA library paths.
│
│ In any other case, please file an issue.
└ @ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/src/initialization.jl:204
[ Info: hello from Oceananigans!
[ Info: Architecture:
[ Info: Your first Oceananigans grid:
┌ Error: Your Tesla V100-SXM2-32GB GPU (compute capability 7.0) is not supported on CUDA 13+.
│ Please use a device with at least capability 7.5, or downgrade your NVIDIA driver to below v580.
└ @ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/state.jl:224
ERROR: LoadError: CUDA error: unknown error (code 999, ERROR_UNKNOWN)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/libcuda.jl:30
[2] check
@ /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/libcuda.jl:37 [inlined]
[3] cuDevicePrimaryCtxRetain
@ /g/data/jk72/jm6603/.julia/packages/GPUToolbox/JLBB1/src/ccalls.jl:33 [inlined]
[4] CUDA.CuContext(pctx::CUDA.CuPrimaryContext)
@ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/context.jl:197
[5] context(dev::CUDA.CuDevice)
@ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/state.jl:238
[6] TaskLocalState (repeats 2 times)
@ /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/state.jl:50 [inlined]
[7] task_local_state!()
@ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/state.jl:79
[8] device
@ /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/lib/cudadrv/state.jl:189 [inlined]
[9] CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}(::UndefInitializer, dims::Tuple{Int64, Int64, Int64})
@ CUDA /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/src/array.jl:91
[10] zeros(::CUDA.CUDAKernels.CUDABackend, ::Type{Float64}, dims::Tuple{Int64, Int64, Int64}; unified::Bool)
@ CUDA.CUDAKernels /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/src/CUDAKernels.jl:25
[11] zeros
@ /g/data/jk72/jm6603/.julia/packages/CUDA/FJf6p/src/CUDAKernels.jl:25 [inlined]
[12] zeros(::CUDA.CUDAKernels.CUDABackend, ::Type, ::Int64, ::Vararg{Int64}; kwargs::@kwargs{})
@ KernelAbstractions /g/data/jk72/jm6603/.julia/packages/KernelAbstractions/X5fk1/src/KernelAbstractions.jl:564
[13] zeros(::GPU{CUDA.CUDAKernels.CUDABackend}, ::Type, ::Int64, ::Vararg{Int64})
@ Oceananigans.Grids /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Grids/zeros_and_ones.jl:9
[14] new_data(FT::DataType, arch::GPU{CUDA.CUDAKernels.CUDABackend}, loc::Tuple{DataType, DataType, DataType}, topo::Tuple{DataType, DataType, DataType}, sz::Tuple{Int64, Int64, Int64}, halo_sz::Tuple{Int64, Int64, Int64}, indices::Tuple{Colon, Colon, Colon})
@ Oceananigans.Grids /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Grids/new_data.jl:67
[15] new_data
@ /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Grids/new_data.jl:72 [inlined]
[16] Field(loc::Tuple{DataType, DataType, DataType}, grid::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Oceananigans.Grids.StaticVerticalDiscretization{OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, Float64, Float64}, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{CUDA.CUDAKernels.CUDABackend}}, T::DataType)
@ Oceananigans.Fields /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Fields/field.jl:185
[17] CenterField
@ /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Fields/field.jl:206 [inlined]
[18] CenterField(grid::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Oceananigans.Grids.StaticVerticalDiscretization{OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, Float64, Float64}, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{CUDA.CUDAKernels.CUDABackend}})
@ Oceananigans.Fields /g/data/jk72/jm6603/.julia/packages/Oceananigans/Z7sKR/src/Fields/field.jl:206
[19] top-level scope
@ /g/data/jk72/jm6603/checkouts/southfront/src/test-gpu.jl:16
in expression starting at /g/data/jk72/jm6603/checkouts/southfront/src/test-gpu.jl:16
arch = GPU{CUDA.CUDAKernels.CUDABackend}(CUDA.CUDAKernels.CUDABackend(false, true))
grid = 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CUDAGPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0) regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 2.0) regularly spaced with Δy=0.25
└── Bounded z ∈ [-3.0, 0.0] regularly spaced with Δz=0.375

taimoorsohail Jan 13, 2026
Collaborator

Hi @janjaapmeijer have you followed the instructions here: taimoorsohail/ocean-ensembles#74?

Specifically, you need to edit/create a LocalPreferences.toml file which tells the system to use the local CUDA installation. Instructions are above.

Using Australia's Gadi HPC #4584

Uh oh!

Uh oh!

navidcy Jun 6, 2025 Maintainer

Overview

Scope

Links

Getting started on Gadi

An example script

Submit a job via PBS

Run on GPU

Run on many GPUs

Replies: 3 comments · 8 replies

Uh oh!

Uh oh!

francispoulin Jun 6, 2025 Collaborator

Uh oh!

navidcy Jun 6, 2025 Maintainer Author

Uh oh!

francispoulin Jun 6, 2025 Collaborator

Uh oh!

navidcy Jun 6, 2025 Maintainer Author

Uh oh!

francispoulin Jun 6, 2025 Collaborator

Uh oh!

taimoorsohail Nov 25, 2025 Collaborator

Uh oh!

simone-silvestri Nov 25, 2025 Maintainer

Uh oh!

Uh oh!

janjaapmeijer Jan 8, 2026

Uh oh!

taimoorsohail Jan 8, 2026 Collaborator

Uh oh!

janjaapmeijer Jan 13, 2026

Uh oh!

taimoorsohail Jan 13, 2026 Collaborator

navidcy
Jun 6, 2025
Maintainer

Replies: 3 comments 8 replies

francispoulin
Jun 6, 2025
Collaborator

navidcy Jun 6, 2025
Maintainer Author

francispoulin Jun 6, 2025
Collaborator

navidcy Jun 6, 2025
Maintainer Author

francispoulin Jun 6, 2025
Collaborator

taimoorsohail
Nov 25, 2025
Collaborator

simone-silvestri Nov 25, 2025
Maintainer

janjaapmeijer
Jan 8, 2026

taimoorsohail Jan 8, 2026
Collaborator

taimoorsohail Jan 13, 2026
Collaborator