|
| 1 | +# Job Submission Scripts for HPC Clusters |
| 2 | + |
| 3 | +This page provides concrete examples and best practices for running calibrations on HPC clusters using ClimaCalibrate.jl. The examples assume basic familiarity with either Slurm or PBS job schedulers. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +ClimaCalibrate.jl supports two main approaches for running calibrations on HPC clusters: |
| 8 | + |
| 9 | +1. **WorkerBackend**: Uses Julia's distributed computing capabilities with workers managed by the job scheduler |
| 10 | +2. **HPC Backends**: Directly submits individual model runs as separate jobs to the scheduler |
| 11 | + |
| 12 | +The choice between these approaches depends on your cluster's resource allocation policies and your model's computational requirements. |
| 13 | +For more information, see the Backends page. |
| 14 | + |
| 15 | +## WorkerBackend on a Slurm cluster |
| 16 | + |
| 17 | +When using `WorkerBackend` on a Slurm cluster, allocate resources at the top level since Slurm allows nested resource allocations. Each worker will inherit one task from the Slurm allocation. |
| 18 | + |
| 19 | +```bash |
| 20 | +#!/bin/bash |
| 21 | +#SBATCH --job-name=slurm_calibration |
| 22 | +#SBATCH --output=calibration_%j.out |
| 23 | +#SBATCH --time=12:00:00 |
| 24 | +#SBATCH --ntasks=5 |
| 25 | +#SBATCH --cpus-per-task=4 |
| 26 | +#SBATCH --gpus-per-task=1 |
| 27 | +#SBATCH --mem=8G |
| 28 | + |
| 29 | +# Set environment variables for CliMA |
| 30 | +export CLIMACOMMS_DEVICE="CUDA" |
| 31 | +export CLIMACOMMS_CONTEXT="SINGLETON" |
| 32 | + |
| 33 | +# Load required modules |
| 34 | +module load climacommon |
| 35 | + |
| 36 | +# Build and run the Julia code |
| 37 | +julia --project=calibration -e 'using Pkg; Pkg.instantiate(;verbose=true)' |
| 38 | +julia --project=calibration calibration_script.jl |
| 39 | +``` |
| 40 | + |
| 41 | +**Key points:** |
| 42 | +- `--ntasks=5`: Requests 5 tasks, each worker gets one task |
| 43 | +- `--cpus-per-task=4`: Each worker gets 4 CPU cores |
| 44 | +- `--gpus-per-task=1`: Each worker gets 1 GPU |
| 45 | +- Uses `%j` in output/error file names to interpolate the job ID |
| 46 | + |
| 47 | +## WorkerBackend on a PBS cluster |
| 48 | + |
| 49 | +Since PBS does not support nested resource allocations, request minimal resources for the top-level script. Each worker will acquire its own resource allocation through the `PBSManager`. |
| 50 | + |
| 51 | +```bash |
| 52 | +#!/bin/bash |
| 53 | +#PBS -N pbs_calibration |
| 54 | +#PBS -o calibration_${PBS_JOBID}.out |
| 55 | +#PBS -l walltime=12:00:00 |
| 56 | +#PBS -l select=1:ncpus=1:mem=2GB |
| 57 | + |
| 58 | +# Set environment variables for CliMA |
| 59 | +export CLIMACOMMS_DEVICE="CUDA" |
| 60 | +export CLIMACOMMS_CONTEXT="SINGLETON" |
| 61 | + |
| 62 | +# Load required modules |
| 63 | +module load climacommon |
| 64 | + |
| 65 | +# Build and run the Julia code |
| 66 | +julia --project=calibration -e 'using Pkg; Pkg.instantiate(;verbose=true)' |
| 67 | +julia --project=calibration calibration_script.jl |
| 68 | +``` |
| 69 | + |
| 70 | +**Key points:** |
| 71 | +- Requests only 1 CPU core for the main script |
| 72 | +- Workers will be launched as separate PBS jobs with their own resource allocations |
| 73 | +- Uses `${PBS_JOBID}` to include the job ID in output file names |
| 74 | + |
| 75 | +## HPC Backend Approach |
| 76 | + |
| 77 | +HPC backends directly submit individual forward model runs as separate jobs to the scheduler. This approach is ideal when: |
| 78 | +- Your forward model requires multiple CPU cores or GPUs |
| 79 | +- You need fine-grained control over resource allocation per model run |
| 80 | +- Your cluster doesn't support nested allocations |
| 81 | + |
| 82 | +Since each model run consists of an independent resource allocation, minimal resources are needed to run the top-level calibration script. |
| 83 | +For a slurm cluster, here is a minimal submission script: |
| 84 | +```bash |
| 85 | +#!/bin/bash |
| 86 | +#SBATCH --job-name=slurm_calibration |
| 87 | +#SBATCH --output=calibration_%j.out |
| 88 | +#SBATCH --time=12:00:00 |
| 89 | +#SBATCH --ntasks=1 |
| 90 | +#SBATCH --cpus-per-task=1 |
| 91 | + |
| 92 | +# Load required modules |
| 93 | +module load climacommon |
| 94 | + |
| 95 | +# Build and run the Julia code |
| 96 | +julia --project=calibration -e 'using Pkg; Pkg.instantiate(;verbose=true)' |
| 97 | +julia --project=calibration calibration_script.jl |
| 98 | +``` |
| 99 | +For a PBS cluster, the script in the WorkerBackend section can be reused since it already specifies a minimal resource allocation. |
| 100 | + |
| 101 | +## Resource Configuration |
| 102 | + |
| 103 | +### CPU-Only Jobs |
| 104 | + |
| 105 | +For CPU-only forward models: |
| 106 | + |
| 107 | +```julia |
| 108 | +hpc_kwargs = Dict( |
| 109 | + :time => 30, |
| 110 | + :ntasks => 1, |
| 111 | + :cpus_per_task => 8, |
| 112 | + :mem => "16G" |
| 113 | +) |
| 114 | +``` |
| 115 | + |
| 116 | +### GPU Jobs |
| 117 | + |
| 118 | +For GPU-accelerated forward models: |
| 119 | + |
| 120 | +```julia |
| 121 | +hpc_kwargs = Dict( |
| 122 | + :time => 60, |
| 123 | + :ntasks => 1, |
| 124 | + :cpus_per_task => 4, |
| 125 | + :gpus_per_task => 1, |
| 126 | + :mem => "32G" |
| 127 | +) |
| 128 | +``` |
| 129 | + |
| 130 | +### Multi-Node Jobs |
| 131 | + |
| 132 | +For models requiring multiple nodes: |
| 133 | + |
| 134 | +```julia |
| 135 | +hpc_kwargs = Dict( |
| 136 | + :time => 120, |
| 137 | + :ntasks => 16, |
| 138 | + :cpus_per_task => 4, |
| 139 | + :nodes => 4, |
| 140 | + :mem => "64G" |
| 141 | +) |
| 142 | +``` |
| 143 | + |
| 144 | +## Environment Variables |
| 145 | + |
| 146 | +Set these environment variables in your submission script: |
| 147 | + |
| 148 | +- `CLIMACOMMS_DEVICE`: Set to `"CUDA"` for GPU runs or `"CPU"` for CPU-only runs |
| 149 | +- `CLIMACOMMS_CONTEXT`: Set to `"SINGLETON"` for WorkerBackend. The context is automatically set to `"MPI"` for HPC backends |
| 150 | + |
| 151 | +## Troubleshooting |
| 152 | + |
| 153 | +### Common Issues |
| 154 | + |
| 155 | +1. **Worker Timeout**: Increase `ENV["JULIA_WORKER_TIMEOUT"]` in your Julia session if workers are timing out |
| 156 | +2. **Memory Issues**: Monitor memory usage and adjust `--mem` or `-l mem` accordingly. |
| 157 | +3. **GPU Allocation**: Ensure `--gpus-per-task` or `-l select` is set correctly |
| 158 | +4. **Module Conflicts**: Use `module purge` and ensure your MODULEPATH is set before loading required modules |
| 159 | + |
| 160 | +### Debugging Commands |
| 161 | + |
| 162 | +```bash |
| 163 | +# Check job status (Slurm) |
| 164 | +squeue -u $USER |
| 165 | + |
| 166 | +# Check job status (PBS) |
| 167 | +qstat -u $USER |
| 168 | + |
| 169 | +# View job logs |
| 170 | +tail -f calibration_<jobid>.out |
| 171 | + |
| 172 | +# Check resource usage |
| 173 | +seff <jobid> # Slurm |
| 174 | +qstat -f <jobid> # PBS |
| 175 | +``` |
0 commit comments