|
| 1 | +--- |
| 2 | +title: "Introducton to Baskerville" |
| 3 | +author: "Advanced Research Computing (ARC) Team" |
| 4 | +format: |
| 5 | + revealjs: |
| 6 | + theme: bask2.scss |
| 7 | + footer: Baskerville Hackathon 24/03/2025 |
| 8 | + logo: images/BaskervilleW.svg |
| 9 | +--- |
| 10 | + |
| 11 | +# Introduction (*And Welcome*) to Baskerville{.smaller} |
| 12 | + |
| 13 | +# Baskerville Hackathon |
| 14 | + |
| 15 | +## Baskerville Content {.smaller} |
| 16 | + |
| 17 | +::: {} |
| 18 | +1. What is Baskerville |
| 19 | +1. Baskerville resources |
| 20 | +1. How to access Baskerville |
| 21 | +1. How to submit jobs to Baskerville |
| 22 | +::: |
| 23 | + |
| 24 | +## What is Baskerville {.smaller} |
| 25 | + |
| 26 | +- Baskerville is a tier 2 HPC (High Performance Computer) |
| 27 | +- It is a GPU focused system with 228 GPUs (57 nodes) in total |
| 28 | + - 46 Nodes with A100 GPUs |
| 29 | + - 11 Nodes A100-80 GPUs |
| 30 | +- Icelake CPUs with 72 cores per node |
| 31 | +- 5.4 PB storage |
| 32 | + |
| 33 | +::: {.callout-note appearance="simpe"} |
| 34 | +HyperThreading: |
| 35 | + |
| 36 | + Baskerville has hyperthereading enabled this means that 1 core = 2 threads/tasks |
| 37 | + |
| 38 | +::: |
| 39 | + |
| 40 | +## Baskerville Resources {.smaller}{visibility="hidden"} |
| 41 | + |
| 42 | +Baskerville has a number of resources availible to you: |
| 43 | + |
| 44 | +::: {.incremental} |
| 45 | +- <https://www.baskerville.ac.uk/> - The homepage for Baskerville, has case studies and papers published by Baskerville users |
| 46 | +- <https://admin.baskerville.ac.uk/> - The admin site is a place to manage your Baskerville account and view user and project information |
| 47 | +- <https://apps.baskerville.ac.uk/> - The apps website contains a list of all the installed applications on Baskerville |
| 48 | +- <https://portal.baskerville.ac.uk/> - The portal site allowds users to use interactive apps in a web browser |
| 49 | +::: |
| 50 | + |
| 51 | + |
| 52 | +## Baskerville Resources {.smaller} |
| 53 | + |
| 54 | +Baskerville has a number of resources availible to you: |
| 55 | + |
| 56 | +:::: {.columns} |
| 57 | +::: {.column width="50%"} |
| 58 | +::: {.incremental} |
| 59 | +- <https://www.baskerville.ac.uk/> - The homepage for Baskerville, has case studies and papers published by Baskerville users |
| 60 | +- <https://admin.baskerville.ac.uk/> - The admin site is a place to manage your Baskerville account and view user and project information |
| 61 | +::: |
| 62 | +::: |
| 63 | + |
| 64 | +::: {.column width="50"} |
| 65 | +::: {.incremental} |
| 66 | +- <https://apps.baskerville.ac.uk/> - The apps website contains a list of all the installed applications on Baskerville |
| 67 | +- <https://portal.baskerville.ac.uk/> - The portal site allowds users to use interactive apps in a web browser |
| 68 | +::: |
| 69 | +::: |
| 70 | +:::: |
| 71 | + |
| 72 | +## Baskerville Hackathon {.smaller} |
| 73 | + |
| 74 | +::: {.incremental} |
| 75 | +- 24th March - 26th March you will have access to Baskerville |
| 76 | +- Reservation of 2 nodes (8 GPUs, 144 cores and 288 threads) |
| 77 | + - A reservation only allows those from seltcted projects to use them this is done through the `--reservation` flag |
| 78 | +- Everyone is part of the project `ranaaaa-hackathon` |
| 79 | +- Other projects have more memory (5 TB) and contain the data. |
| 80 | +::: |
| 81 | + |
| 82 | +## Baskerville Login {.smaller} |
| 83 | + |
| 84 | +- Everyone will receive a welcome email when being first added to Baskerville |
| 85 | +- Email points to our documentation and instructions for first time access <https://docs.baskerville.ac.uk/logging-on/> |
| 86 | + - Instructions and video present to show both what to do and what not to do |
| 87 | +- To setup your one time passcode you need an authenticator app please have this open and ready to scan your OTP |
| 88 | +- To login you will need to username, password and OTP to access Baskerville |
| 89 | +- Whilst ssh key option is available this is not recommended during the hackathon |
| 90 | + |
| 91 | +## Baskerville Login Example |
| 92 | + |
| 93 | +Insert video here |
| 94 | + |
| 95 | +## Baskerville Storage {.smaller} |
| 96 | + |
| 97 | +:::: {.columns} |
| 98 | +::: {.column width="50%"} |
| 99 | +::: {.incremental} |
| 100 | +- **home directory** `/bask/homes/_inital_/_username_` |
| 101 | + - When you first login, this is where you start |
| 102 | + - Only 20 GB of space - this is not where you run jobs |
| 103 | + - Should be for local and cached files |
| 104 | +::: |
| 105 | +::: |
| 106 | + |
| 107 | +::: {.column width="50%"} |
| 108 | +::: {.incremental} |
| 109 | +- **Project Directory** `/bask/project/_initial_/_projectname_` |
| 110 | + - Has 5 TB space - place where you run jobs |
| 111 | + - Contains you data, job scripts and results |
| 112 | +::: |
| 113 | +::: |
| 114 | +:::: |
| 115 | + |
| 116 | + |
| 117 | +## Baskerville User Details {.smaller} |
| 118 | + |
| 119 | +- <https://docs.baskerville.ac.uk/storage/> |
| 120 | +- You can inspect your details either in the [admin site](https://admin.baskerville.ac.uk/) or in the terminal: |
| 121 | + - `my_quota` - This shows how much of your home directory is utilised |
| 122 | + - `my_baskerville` - This gives your project account and QoS details |
| 123 | + |
| 124 | +## Baskerville User Details Example |
| 125 | + |
| 126 | +Insert video here |
| 127 | + |
| 128 | +# Baskerville jobs |
| 129 | + |
| 130 | +## Login and Compute nodes {.smaller} |
| 131 | + |
| 132 | +- There are 2 types of nodes on Baskerville |
| 133 | + |
| 134 | +::: {.incremental} |
| 135 | + - A **login node** is where you start and Baskerville has 3 `bask-pg-login01`, `bask-pg-login02` and `bask-pg-login03` |
| 136 | + - Access to all Baskerville users |
| 137 | + - Intended for simple tasks like managing files |
| 138 | + - Does not have access to GPUs |
| 139 | + - A job submitted goes from a login node to a compute node |
| 140 | + - A **compute node** is the node that does the work nodes do not have login in their name |
| 141 | + - When you submit a job this is where they go |
| 142 | +::: |
| 143 | + |
| 144 | +## Job script {.smaller} |
| 145 | + |
| 146 | +- The most common way to run jobs is with the `sbatch` command and the name of you job script |
| 147 | +- Your job script will contain: |
| 148 | + |
| 149 | +::: {.incremental} |
| 150 | + - Job account |
| 151 | + - Job time |
| 152 | + - Job compute resources (CPUS, GPUs etc) |
| 153 | + - Quality of Service (QoS) for everyone in this hackathon this is `bham` |
| 154 | + - Other options are also availible <https://slurm.schedmd.com/sbatch.html> |
| 155 | +::: |
| 156 | + |
| 157 | +## Job script example {auto-animate=true} |
| 158 | + |
| 159 | +```bash |
| 160 | +#!/bin/bash |
| 161 | +#SBATCH --qos=bham |
| 162 | +#SBATCH --account=ranaaaa-hackathon |
| 163 | +#SBATCH --time=1:0:0 |
| 164 | +#SBATCH --ntasks=4 |
| 165 | +#SBATCH --gres=gpu:1 |
| 166 | +#SBATCH --reservation=ranaaaa-hackathon |
| 167 | + |
| 168 | +module purge |
| 169 | +module load baskerville |
| 170 | + |
| 171 | +# your commands |
| 172 | +``` |
| 173 | + |
| 174 | +## Job script example {auto-animate=true} |
| 175 | + |
| 176 | +```bash |
| 177 | +#!/bin/bash <--- run the job using GNU Bourne Again Shell |
| 178 | +#SBATCH --qos=bham |
| 179 | +#SBATCH --account=ranaaaa-hackathon |
| 180 | +#SBATCH --time=1:0:0 |
| 181 | +#SBATCH --ntasks=4 |
| 182 | +#SBATCH --gres=gpu:1 |
| 183 | +#SBATCH --reservation=ranaaaa-hackathon |
| 184 | + |
| 185 | +module purge |
| 186 | +module load baskerville |
| 187 | + |
| 188 | +# your commands |
| 189 | +``` |
| 190 | + |
| 191 | +## Job script example {auto-animate=true} |
| 192 | + |
| 193 | +```bash |
| 194 | +#!/bin/bash |
| 195 | +#SBATCH --qos=bham <--- Using bham job queue |
| 196 | +#SBATCH --account=ranaaaa-hackathon <--- Hackathon project account |
| 197 | +#SBATCH --time=0:20:0 <--- Job to run for 20 minutes |
| 198 | +#SBATCH --ntasks=4 <--- Job requests 4 tasks |
| 199 | +#SBATCH --gres=gpu:1 <--- Requesting 1 GPU |
| 200 | +#SBATCH --reservation=ranaaaa-hackathon <--- Use reserved resources |
| 201 | + |
| 202 | +module purge |
| 203 | +module load baskerville |
| 204 | + |
| 205 | +# your commands |
| 206 | +``` |
| 207 | + |
| 208 | +## Job script example {auto-animate=true} |
| 209 | + |
| 210 | +```bash |
| 211 | +#!/bin/bash |
| 212 | +#SBATCH --qos=bham |
| 213 | +#SBATCH --account=ranaaaa-hackathon |
| 214 | +#SBATCH --time=1:0:0 |
| 215 | +#SBATCH --ntasks=4 |
| 216 | +#SBATCH --gres=gpu:1 |
| 217 | +#SBATCH --reservation=ranaaaa-hackathon |
| 218 | + |
| 219 | +module purge <--- Purging environemnt |
| 220 | +module load baskerville <--- Loading default Baskerville modules |
| 221 | + <--- Add additional module here |
| 222 | +# your commands <--- Commands you want to run |
| 223 | +``` |
| 224 | + |
| 225 | +## Submitting a job {.smaller} |
| 226 | + |
| 227 | +When you submit a job you will see: |
| 228 | +``` |
| 229 | +sbatch job.sh |
| 230 | +Submitted batch job 992337 |
| 231 | +``` |
| 232 | + |
| 233 | +The number (in this case `992337`) is your job id this is a unique job number |
| 234 | + |
| 235 | + |
| 236 | +## SLURM {.smaller} |
| 237 | + |
| 238 | +- SLURM=Simple Linux Utility for Resource Management <https://docs.baskerville.ac.uk/jobs/#slurm-jobs> |
| 239 | + - SLURM is our job sceduler it decides the priority of jobs and when they start |
| 240 | + - Since this has a reservation jobs submitted will be outside of the general queue |
| 241 | + - Batch jobs are submitted with the `sbatch` command and you can monitor the job with `squeue` |
| 242 | + - If your job is in the queue you can use the `sbatch --start` to see an estimate of when your job will start |
| 243 | + - If you want to cancel a job you can use `scancel XXXX` where XXXX is the job id |
| 244 | + |
| 245 | +## Job results file {.smaller} |
| 246 | + |
| 247 | +::: {.panel-tabset} |
| 248 | + |
| 249 | +## Output file .out |
| 250 | + |
| 251 | +- A file `slurm-XXXXX.out` is created where XXXXX is the job id |
| 252 | +- This file would be the standard output of whatever you would run |
| 253 | +- |
| 254 | + |
| 255 | +## Stats file .stats |
| 256 | + |
| 257 | +- A file `slurm-XXXXX.stats` is created where XXXXX is the job id |
| 258 | +- Contains information on resources used (time, memory amount of CPUS and GPUs |
| 259 | +- Will also contain the exit code so you can see if job ended correctly) |
| 260 | + |
| 261 | +```bash |
| 262 | ++--------------------------------------------------------------------------+ |
| 263 | +| Job on the Baskerville cluster: |
| 264 | +| Starting at Tue Sep 28 15:00:30 2021 for auser(123456) |
| 265 | +| Identity jobid 12345 jobname cudaopenmp.sh |
| 266 | +| Running against project ace-project and in partition baskerville-shared |
| 267 | +| Requested cpu=288,mem=864G,node=2,billing=288,gres/gpu=8 - 00:10:00 walltime |
| 268 | +| Assigned to nodes bask-pg0308u30a,bask-pg0308u31a |
| 269 | +| Command /bask/projects/a/ace-project/cudaopenmp.sh |
| 270 | +| WorkDir /bask/projects/a/ace-project |
| 271 | ++--------------------------------------------------------------------------+ |
| 272 | ++--------------------------------------------------------------------------+ |
| 273 | +| Finished at Tue Sep 28 15:00:35 2021 for auser(123456) on the Baskerville Cluster |
| 274 | +| Required (00:01.942 cputime, 3580K memory used) - 00:00:05 walltime |
| 275 | +| JobState COMPLETING - Reason None |
| 276 | +| Exitcode 0:0 |
| 277 | ++--------------------------------------------------------------------------+ |
| 278 | +``` |
| 279 | + |
| 280 | +::: |
| 281 | + |
| 282 | + |
| 283 | +# Thank You |
0 commit comments