Skip to content

Commit a927cf4

Browse files
Setup page
1 parent 88a3f33 commit a927cf4

File tree

2 files changed

+317
-0
lines changed

2 files changed

+317
-0
lines changed

_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,7 @@ parts:
335335
sections:
336336
- file: individual_modules/section_landing_pages/introduction_to_GPUs
337337
sections:
338+
- file: individual_modules/intro_to_GPUs/setup
338339
- file: individual_modules/intro_to_GPUs/theory
339340
- file: individual_modules/intro_to_GPUs/spack
340341
- file: individual_modules/intro_to_GPUs/slurm
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# Context and Setup Guide
2+
3+
## Course Philosophy
4+
5+
Throughout this course, two guiding principles have been kept in mind:
6+
7+
1) **Complete Pipeline Approach**. Getting to the point where you can simply:
8+
9+
```python
10+
import cupy as cp
11+
```
12+
13+
is far harder in the context of using GPUs than writing code for GPUs.
14+
15+
2) **Focus on practical GPU use**. This is a course about **using GPUs**, not about the low-level details of **programming GPUs**.
16+
17+
By the end, you’ll have everything in place to leverage GPU acceleration immediately. We’ll walk you through installing the tools, configuring your environment, and running your first CUDA-powered code, so you can start leveraging the benefit of GPUs.
18+
19+
## University of Exeter ISCA HPC Installation Instructions
20+
21+
If you have not used an HPC platform before, then you may benefit from going through the material in "Helpful Auxiliary Software" on this page, as it will guide you through the process of connecting to an HPC platform, after which you can continue moving through these set-up instructions.
22+
23+
## Clone the Repo
24+
25+
To engage with all of the content within this GPU Training course, you will need to clone the repo, which can be done with
26+
27+
``` bash
28+
cd /lustre/projects/Research_Project-RSATeam #This is the directory that the RSA Team will do the course in.
29+
mkdir $USER # Create a directory for you within the project space.
30+
cd $USER
31+
git clone https://github.com/UniExeterRSE/GPU_Training.git
32+
cd GPU_Training
33+
```
34+
35+
### Two Ways to Run the Course Code
36+
37+
#### Method 1: Interactive GPU Session
38+
39+
If you prefer to work interactively, follow these steps:
40+
41+
Request an interactive session:
42+
43+
```bash
44+
srun \
45+
--partition=gpu \
46+
-A Research_Project-RSATeam \
47+
--time=12:00:00 \
48+
--nodes=1 \
49+
--ntasks=1 \
50+
--gres=gpu:1 \
51+
--cpus-per-task=4 \
52+
--pty /bin/bash
53+
```
54+
55+
Load required modules:
56+
57+
```bash
58+
module load nvidia-cuda/12.1.1
59+
module load Python/3.11.3
60+
```
61+
62+
Install the Python requirements:
63+
64+
```bash
65+
poetry install
66+
```
67+
68+
Once your environment is ready, you can invoke any of the project’s entry points via Poetry. For example:
69+
70+
```bash
71+
poetry run cuda_check
72+
```
73+
74+
#### Method 2: Batch Submission via Slurm
75+
76+
All of the key Slurm submission scripts live in the
77+
`exeter_isca_slurm_submission_scripts/` directory. You can submit a job with
78+
79+
```bash
80+
cd exeter_isca_slurm_submission_scripts
81+
sbatch <script-name>.slurm
82+
```
83+
84+
## General Installation Instructions
85+
86+
The following provides the steps that are required to install the necessary compilers and packages to engage with the material in this course.
87+
88+
```{important}
89+
Please keep in mind that nearly all of the commands used in this section will be covered in detail within the course itself. They are included here to make sure you have all of the necessary resources (e.g. a GPU and relevant compilers) to complete the whole course. **The intention is for you to run these commands and confirm the output based on the contents of this page, not to completely understand each step you are taking.** If you do get stuck and are unsure of how to proceed, please reach out to the authors, and we can help you debug.
90+
91+
If you are self-studying, then please read up to the section "Project: Conway's Game of Life - CPU vs GPU Implementation" to understand more about the commands that are being used. If you are taking the workshop, then these commands are here to make sure that you are able to run code on the designated platform to save time in the workshop and identify any permission errors when accessing the needed resources.
92+
```
93+
94+
## Spack - Installing system-level requirements
95+
96+
Within this course, [Spack](https://spack.io/) is being used to manage system-level requirements, such as drivers. The reason for this is that a lot of system-level requirements generally require privileged permissions, such as access to `sudo`. However, as a lot of the platforms that have GPUs available are HPC platforms, `spack` allows us to install drivers that normally would require privileged access. There are also a range of other benefits to the use of `spack` that will be discussed in this course.
97+
98+
First, you will need to clone the `spack` repo in your user home directory at a recent stable version (extra config and depth flags suggested in spack's readme):
99+
100+
``` bash
101+
git clone -c feature.manyFiles=true --depth=2 -b v0.23.1 https://github.com/spack/spack.git
102+
```
103+
104+
You will then need to activate `spack` with:
105+
106+
```bash
107+
source spack/share/spack/setup-env.sh
108+
```
109+
110+
```{note}
111+
You can check that `spack` has been successfully installed by running `spack --version`, which should return the version of spack that you have available.
112+
```
113+
114+
You will need need to create a spack environment, which can be done with the following, creating a `spack` environment named "gpu_course":
115+
116+
```bash
117+
spack env create gpu_course
118+
```
119+
120+
which can then be activated with
121+
122+
```bash
123+
spack env activate -p gpu_course
124+
```
125+
126+
In this course, spack is being used to install system-level requirements, and so the required version of Python and the needed driver of CUDA are installed via spack with the following two commands.
127+
128+
```bash
129+
130+
spack add cuda
131+
```
132+
133+
```{note}
134+
This step will simply say that you intend to install these packages; at this time, `spack` is still waiting for more packages to be added to the environment specification. We can check what the current specification is (e.g. package list, dependencies, compilers to be used etc.) with `spack spec`.
135+
```
136+
137+
Finally, we are able to install all of the packages into our `spack` environment with
138+
139+
```bash
140+
spack install
141+
```
142+
143+
```{note}
144+
On an HPC environment, we would want to put the above spack commands into a shell script and run this with the scheduler, such as `sbatch` for ISCA/Archer2. The `install` can take on the order of hours for the above specifications.
145+
```
146+
147+
```{note}
148+
The `.spack` directory is a hidden folder in your home directory that stores user-level configuration data, caches, and environment settings for Spack. It helps Spack remember things like what packages you have installed, which mirrors you have configured, and any custom settings you have applied. Sometimes, these configuration files or caches can become outdated or inconsistent, especially if you have been experimenting with different environments, modifying package recipes, or changing `spack` versions. When a "weird" or hard-to-troubleshoot error occurs, one way to rule out bad configuration or cache data is to remove the `.spack` directory. By doing so, you essentially give Spack a clean slate: it will recreate the directory and its necessary files the next time it runs, which often resolves mysterious issues stemming from old or corrupted data. If you try to get a clean slate for spack by just removing the non-hidden `spack` directory, then it will likely not be a clean slate, and the previous experimentations data will still be present.
149+
```
150+
151+
## Poetry - Installing user-level requirements
152+
153+
Within this course, [Poetry](https://python-poetry.org/) is used to manage the user-level requirements.
154+
155+
The following command will install poetry:
156+
157+
```bash
158+
curl -sSL https://install.python-poetry.org | python3 -
159+
```
160+
161+
```{note}
162+
Poetry can be uninstalled with `curl -sSL https://install.python-poetry.org | python3 - --uninstall`.
163+
```
164+
165+
```{note}
166+
`poetry install` needs to be run from within the training course repo. If you haven't, then you need to clone this repo with `git clone https://github.com/UniExeterRSE/GPU_Training.git` and then navigate to its root with `cd GPU_Training`
167+
```
168+
169+
All of the user-level requirements can be installed via Poetry with the command:
170+
171+
```bash
172+
poetry install
173+
```
174+
175+
`````{admonition} IMPORTANT: If running locally...
176+
:class: important
177+
You can check that the installation has been successful by running `poetry run cuda_check`, which should return the number of CUDA devices that are currently available, such as `Number of CUDA devices: 1`. If you want to find out more information about the device that is connected, you can run a command such as `nvidia-semi` for an NVIDIA GPU.
178+
`````
179+
180+
`````{admonition} IMPORTANT: If running on a HPC...
181+
:class: important
182+
If you are working on an HPC cluster via SLURM, submit the `cuda_check.slurm` script instead of running the commands directly. The script contain the same commands as above (e.g. `poetry run cuda_check` and `nvidia-smi`) that the `.slurm` script will run and store the output and errors in the files `out.log` and `err.log` respectively. This can be done with the command `sbatch slurm_submission_scripts/cuda_check.slurm`.
183+
`````
184+
185+
## Data
186+
187+
### Data Download
188+
189+
```{note}
190+
For the RSA Team Day the data files are available on the shared ISCA file-system.
191+
```
192+
193+
To download the dataset, follow these steps:
194+
195+
- **Create a Copernicus Marine Account**:
196+
- You will need an account to access the data. Register here: [Register for Account](https://data.marine.copernicus.eu/register?redirect=%2Fproduct%2FGLOBAL_ANALYSISFORECAST_PHY_001_024%2Fdownload%3Fdataset%3Dcmems_mod_glo_phy-thetao_anfc_0.083deg_PT6H-i_202406).
197+
198+
- **Run the CLI Command to Download the Dataset**:
199+
- Use the following command to download the subset of data:
200+
201+
```bash
202+
poetry run download_data
203+
```
204+
205+
- This command will prompt you to enter your username and password. Once authenticated, the data file will download to the data directory. Please note that the download may take some time as the file size is approximately 250 MB.
206+
207+
### Data Description
208+
209+
The dataset used during the course is based on 3-dimensional Ocean Temperatures. The dataset is described in detail on the [Copernicus Marine Data Service](https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/description)
210+
211+
**Filename**: `cmems_mod_glo_phy-thetao_anfc_0.083deg_PT6H-i_1730799065517.nc`
212+
213+
**Description**:
214+
This dataset was downloaded from the **Global Ocean Physics Analysis and Forecast** service. It provides data for global ocean physics, focusing on seawater potential temperature.
215+
216+
- **Product Identifier**: `GLOBAL_ANALYSISFORECAST_PHY_001_024`
217+
- **Product Name**: Global Ocean Physics Analysis and Forecast
218+
- **Dataset Identifier**: `cmems_mod_glo_phy-thetao_anfc_0.083deg_PT6H-i`
219+
220+
**Variable Visualized**:
221+
222+
- **Sea Water Potential Temperature (thetao)**: Measured in degrees Celsius [°C].
223+
224+
**Geographical Area of Interest**:
225+
226+
- **Region**: Around the United Kingdom
227+
- **Coordinates**:
228+
- **Northern Latitude**: 65.312
229+
- **Eastern Longitude**: 6.1860
230+
- **Southern Latitude**: 46.829
231+
- **Western Longitude**: -13.90
232+
233+
**Depth Range**:
234+
235+
- **Minimum Depth**: 0.49 meters
236+
- **Maximum Depth**: 5727.9 meters
237+
238+
**File Size**:
239+
240+
- **267.5 MB**
241+
242+
## Helpful Auxiliary Software
243+
244+
This section details a number of useful pieces of software that make the development of GPU code easier. Notably, a lot of these sit within Visual Studio Code, chosen as these are what the author was exposed to when first starting in GPU development.
245+
246+
### Using Visual Studio Code (VSCode)
247+
248+
Visual Studio Code (VSCode) can be installed from [its website](https://code.visualstudio.com/).
249+
250+
#### Remote-SSH
251+
252+
This guide walks you through setting up and using **Remote-SSH** in Visual Studio Code (VSCode) to connect to a remote machine.
253+
254+
##### Install the Remote - SSH Extension
255+
256+
Install from [Remote-SSH](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh) or via the following steps:
257+
258+
1. Open **VSCode**.
259+
2. Go to the **Extensions** view by clicking on the square icon in the sidebar or pressing `Ctrl+Shift+X` (Windows/Linux) or `Cmd+Shift+X` (Mac).
260+
3. Search for "**Remote - SSH**" and install the extension from Microsoft.
261+
262+
##### Configure SSH on Your Local Machine
263+
264+
Ensure you can SSH into the remote machine from your terminal. If SSH is not already configured:
265+
266+
1. **Generate SSH Keys** (if not already done):
267+
- Open a terminal on your local machine.
268+
- Run the command `ssh-keygen` and follow the prompts to generate a key pair. This will create keys in `~/.ssh/` by default.
269+
270+
2. **Copy Your Public Key to the Remote Machine**:
271+
- Run the command `ssh-copy-id user@hostname`, replacing `user` and `hostname` with your remote machine’s username and IP address or hostname.
272+
- Enter your password when prompted. This step ensures you can log in without repeatedly typing your password.
273+
274+
##### Add SSH Configuration in VSCode
275+
276+
1. Open **VSCode**.
277+
2. Press `Ctrl+Shift+P` (Windows/Linux) or `Cmd+Shift+P` (Mac) to open the command palette.
278+
3. Type and select **Remote-SSH: Open Configuration File**.
279+
4. Choose the SSH configuration file (usually located at `~/.ssh/config`).
280+
281+
5. Add a new SSH configuration to the file, specifying the remote machine’s details. Here’s an example configuration:
282+
283+
```ssh-config
284+
Host my-remote-machine
285+
HostName <remote-ip-or-hostname>
286+
User <your-username>
287+
IdentityFile ~/.ssh/id_rsa # Path to your SSH private key
288+
Port 22 # Default SSH port; change if needed
289+
```
290+
291+
##### Connecting to remote from within VSCode
292+
293+
You should now be able to connect to the remote machine from within VSCode but using `Ctrl+Shift+P` (Windows/Linux) or `Cmd+Shift+P` (Mac) and then selecting `Remote-SSH: Connect to host...` which should then present a list with the name of the machine you gave in the config file, in the above case `my-remote-machine`. You will then be asked for a password if you protected your ssh key. Once connected, a new VSCode window will be created, and you should have a fully functioning ID on the remote machine.
294+
295+
#### Live Server
296+
297+
As this course produces 3D outputs, some supporting code will generate interactive HTML dashboards to make exploring the output data easier. The VSCode Live Server extension makes the process of viewing these dashboards with your local web browser easier.
298+
299+
##### Install the Live Server Extension
300+
301+
Install from [Live Server](https://marketplace.visualstudio.com/items?itemName=ritwickdey.LiveServer) or via the following steps:
302+
303+
1. Open **VSCode**.
304+
2. Go to the **Extensions** view by clicking on the square icon in the sidebar or pressing `Ctrl+Shift+X` (Windows/Linux) or `Cmd+Shift+X` (Mac).
305+
3. Search for "**Live Server**" and install the extension by **Ritwick Dey**.
306+
307+
---
308+
309+
##### Start the Live Server
310+
311+
1. **Right-click** on the HTML file in the editor and select **Open with Live Server**.
312+
313+
##### View Changes in Real-Time
314+
315+
- As you edit and save your HTML, CSS, or JavaScript files, the browser will automatically refresh to display your changes.
316+
- This eliminates the need to manually refresh the browser manually, speeding up development.

0 commit comments

Comments
 (0)