Skip to content

Commit 8667be2

Browse files
authored
Merge branch 'eth-cscs:main' into cicd-kb-port
2 parents c51b161 + c3caf9e commit 8667be2

File tree

36 files changed

+1740
-1090
lines changed

36 files changed

+1740
-1090
lines changed

docs/access/vscode.md

Lines changed: 111 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,34 @@
11
[](){#ref-access-vscode}
2-
# Connecting with VSCode
2+
# Connecting with VS Code
33

44
[Visual Studio Code](https://code.visualstudio.com/) provides flexible support for remote development.
5-
VSCode's [remote tunnel feature](https://code.visualstudio.com/docs/remote/tunnels) starts a server on a remote system, and connects the editor to this server.
5+
VS Code's [remote tunnel feature](https://code.visualstudio.com/docs/remote/tunnels) starts a server on a remote system, and connects the editor to this server.
66
There are two ways to set up the connection:
77

88
* using the code CLI: the most flexible method if using containers or uenv.
9-
* using the VSCode interface: VSCode will connect onto the system, download and start the server
9+
* using the VS Code interface: VS Code will connect onto the system, download and start the server
1010

11-
The main challenge with using VSCode is that the most convenient method for starting a remote session is to start a remote tunnel from the VS Code GUI.
11+
The main challenge with using VS Code is that the most convenient method for starting a remote session is to start a remote tunnel from the VS Code GUI.
1212
This approach starts a session in the standard login environment on that node, however this won't work if you want to be developing in a container, in a uenv, or on a compute node.
1313

14+
This process is also demonstrated in a webinar on [Interactive computing on "Alps"](https://www.cscs.ch/publications/tutorials/2025/video-of-the-webinar-interactive-computing-on-alps):
15+
16+
<iframe width="100%"
17+
height="315"
18+
src="https://www.youtube.com/embed/cLVpJO_fE6I?si=bTmmsS_9QvTHpUqK&amp;start=2257"
19+
title="YouTube video player"
20+
frameborder="0"
21+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
22+
referrerpolicy="strict-origin-when-cross-origin"
23+
allowfullscreen>
24+
</iframe>
25+
1426
## Flexible method: remote server
1527

16-
The most flexible method for connecting VSCode is to log in to the Alps system, set up your environment (start a container or uenv, start a session on a compute node), and start the remote server in that environment pre-configured.
28+
The most flexible method for connecting VS Code is to log in to the Alps system, set up your environment (start a container or uenv, start a session on a compute node), and start the remote server in that environment pre-configured.
1729

18-
!!! note
19-
This approach requires that you have a GitHub account, and that the GitHub account is configured with your VS Code editor.
30+
[](){#ref-vscode-install}
31+
### Installing the server
2032

2133
The first step is to download the VS Code CLI tool `code`, which CSCS provides for easy download.
2234
There are two executables, one for using on systems with x86 or ARM CPUs respectively.
@@ -33,16 +45,15 @@ There are two executables, one for using on systems with x86 or ARM CPUs respect
3345
tar -xf vscode_cli_alpine_x64_cli.tar.gz
3446
```
3547

36-
!!! note
37-
See the guide on how to manage [architecture-specific binaries][ref-guides-terminal-arch] if you plan to use VScode on both x86 and ARM clusters.
48+
After downloading, copy the `code` executable to a location in your PATH, so that it is available for future sessions.
3849

39-
Alternatively, download the CLI tool from the [VS Code site](https://code.visualstudio.com/Download) -- take care to select either x86 or Arm64 version that matches the target system.
50+
Clusters on Alps share a common [home][ref-storage-home] path `HOME=/users/$USER` that is mounted on all clusters.
4051

41-
After downloading, copy the `code` executable to a location in your PATH, so that it is available for future sessions.
52+
If you want to use VS Code on multiple clusters, possibly with different CPU architectures (Daint, Clariden and Santis use `aarch64` CPUs, and [Eiger][ref-cluster-eiger] uses `x86_64` CPUs), you need to take some additional steps to ensure that VS Code installation and configuration is separated.
4253

43-
??? note "guidance on where to put architecture-specific executables"
44-
The home directory can be shared by multiple clusters that might have different micro-architectures, so it is important to separate executables for x86 and aarch64 (ARM) targets.
54+
First, install the `code` executable in an [architecture-specific path][ref-guides-terminal-arch].
4555

56+
!!! example "Installing VS Code for `x86_64` and `aarch64`"
4657
In `~/.bashrc`, add the following line (you will need to log in again for this to take effect):
4758
```
4859
export PATH=$HOME/.local/$(uname -m)/bin:$PATH
@@ -54,27 +65,65 @@ After downloading, copy the `code` executable to a location in your PATH, so tha
5465
mkdir -p $HOME/.local/$(uname -m)/bin
5566
cp ./code $HOME/.local/$(uname -m)/bin
5667
```
68+
Repeat this for both `x86_64` and `aarch64` binaries.
69+
70+
By default VS Code will store configuration, data and executables in `$HOME/.vscode-server`.
71+
To use VS Code on multiple clusters, it is strongly recommended that you create separate `vscode-server` path for each cluster
72+
by adding the following environment variable definitions to your `~/.bashrc`:
73+
74+
```bash
75+
export VSCODE_AGENT_FOLDER="$HOME/.vscode-server/$CLUSTER_NAME-tunnel/.vscode-server"
76+
export VSCODE_CLI_DATA_DIR="$VSCODE_AGENT_FOLDER/cli"
77+
```
78+
79+
!!! warning
80+
You will need to log out and back in after updating `$HOME/.bashrc`, before trying to start the VS Code server for the first time.
81+
82+
[](){#ref-vscode-update}
83+
### Updating VS Code server
84+
85+
VS Code is continuously being updated, and the version of VS Code on your laptop will most likely be more recent than the version provided by CSCS.
86+
87+
Once you have installed the server, you can easily update it to the latest version:
88+
89+
```console title="Updating VS Code server"
90+
$ code --version
91+
code 1.97.2 (commit e54c774e0add60467559eb0d1e229c6452cf8447)
92+
$ code update
93+
Successfully updated to 1.101.0 (commit dfaf44141ea9deb3b4096f7cd6d24e00c147a4b1)
94+
$ code --version
95+
code 1.101.0 (commit dfaf44141ea9deb3b4096f7cd6d24e00c147a4b1)
96+
```
97+
98+
It is good practice to periodically update code to keep it in sync with the version on your laptop.
99+
100+
[](){#ref-vscode-starting}
101+
### Starting and configuring the server
102+
103+
!!! note
104+
You need to have a GitHub account to connect a remote tunnel to VS Code.
57105

58106
To set up a remote server on the target system,
59-
run the `code` executable that you downloaded the `tunnel` argument.
107+
run the `code` executable that you downloaded with the `tunnel` argument.
60108
You will be asked to choose whether to log in to Microsoft or GitHub (we have tested with GitHub):
61109

62-
```
63-
> code tunnel --name=$CLUSTER_NAME-tunnel
110+
```console
111+
$ code tunnel --name=$CLUSTER_NAME-tunnel
64112
...
65113
? How would you like to log in to Visual Studio Code? ›
66114
Microsoft Account
67115
GitHub Account
68116
```
69117

70118
!!! tip
71-
Give the tunnel a unique name using the `--name` flag, which will later be listed on the VSCode UI.
119+
Give the tunnel a unique name using the `--name` flag, which will later be listed on the VS Code UI.
72120

73121
You will be requested to go to [github.com/login/device](https://github.com/login/device) and enter an 8-digit code.
74-
Once you have finished registering the service with GitHub, in VSCode on your PC/laptop open the "remote explorer" pane on the left hand side of the main window, and the connection will be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
122+
Once you have finished registering the service with GitHub, in VS Code on your PC/laptop open the "remote explorer" pane on the left hand side of the main window, and the connection will be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
123+
124+
!!! note "First time setting up a remote service"
125+
If this is the first time you have followed this procedure, you may have to sign in to GitHub in VS Code.
75126

76-
!!! note "first time setting up a remote service"
77-
If this is the first time you have followed this procedure, you may have to sign in to GitHub in VSCode.
78127
Click on the Remote Explorer button on the left hand side, and then find the following option:
79128

80129
```
@@ -85,11 +134,12 @@ Once you have finished registering the service with GitHub, in VSCode on your PC
85134

86135
If you have not signed in to GitHub with VS Code editor, you will be redirected to the browser to sign in.
87136

88-
After signing in and authorizing VSCode, the open tunnel should be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
137+
After signing in and authorizing VS Code, the open tunnel should be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
89138

139+
[](){#ref-vscode-uenv}
90140
### Using with uenv
91141

92-
To use a uenv with VSCode, the uenv must be started before calling `code tunnel`.
142+
To use a uenv with VS Code, the uenv must be started before calling `code tunnel`.
93143
Log into the target system and start the uenv, then start the remote server, for example:
94144
```
95145
# log into daint (this could be any other Alps cluster)
@@ -106,20 +156,16 @@ ssh daint
106156
uenv run --view=default prgenv-gnu/24.11:v1 -- code tunnel --name=$CLUSTER_NAME-tunnel
107157
```
108158

109-
Once the tunnel is configured, you can access it from VSCode.
159+
Once the tunnel is configured, you can access it from VS Code.
110160

111161
!!! warning
112162
If you plan to do any intensive work: repeated compilation of large projects or running python code in Jupyter, please see the guide to running on a compute node below.
113163
Running intensive workloads on login nodes, which are shared resources between all users, is against CSCS [fair usage][ref-policies-fair-use] of Shared Resources policy.
114164

115-
### Using with containers
116-
117-
!!! todo
118-
write a guide
119-
165+
[](){#ref-vscode-compute-nodes}
120166
### Running on a compute node
121167

122-
If you plan to do computation using your VSCode, then you should first allocate resources on a compute node and set up your environment there.
168+
If you plan to do computation using your VS Code, then you should first allocate resources on a compute node and set up your environment there.
123169

124170
!!! example "directly create the tunnel using srun"
125171
You can directly execute the `code tunnel` command using srun:
@@ -130,7 +176,7 @@ If you plan to do computation using your VSCode, then you should first allocate
130176

131177
* `--uenv` and `--view` set up the uenv
132178
* `-t120` requests a 2 hour (120 minute) reservation
133-
* `-n1` requests a single rank - only one rank/process is required for VSCode
179+
* `-n1` requests a single rank - only one rank/process is required for VS Code
134180
* `--pty` allows forwarding of terminal I/O, required to sign in to Github
135181

136182
Once the job allocation is granted, you will be prompted to log into GitHub, the same as starting a session on the login node.
@@ -155,13 +201,43 @@ If you plan to do computation using your VSCode, then you should first allocate
155201
```
156202

157203
* `-t120` requests a 2 hour (120 minute) reservation
158-
* `-n1` requests a single rank - only one rank/process is required for VSCode
204+
* `-n1` requests a single rank - only one rank/process is required for VS Code
159205
* `--pty` allows forwarding of terminal I/O, for bash to work interactively
160206

161-
## Connecting via VSCode UI
207+
[](){#ref-vscode-containers}
208+
### Using with containers
209+
210+
This will use CSCS's [Container Engine][ref-container-engine], to launch the container on a compute node and start the VS Code server.
211+
212+
```toml title="EDF file with image and mount paths"
213+
image = "nvcr.io#nvidia/pytorch:24.01-py3" # example of PyTorch NGC image
214+
writable = true
215+
mounts = ["/paths/on/scratch/or/home:path/on/the/container",
216+
"/path/if/same/on/both"
217+
"/path/of/code/executable:/path/for/code/executable/in/container"]
218+
workdir = "default/working/dir/path"
219+
```
220+
221+
!!! note
222+
Ensure that the `code` executable is accessible in the container.
223+
It can either be contained in the image, or you can [install][ref-vscode-install] and [update][ref-vscode-update] the server in a path that you [mount][ref-ce-edf-reference-mounts] inside the container in the `mounts` field of the EDF file.
224+
225+
Log into the target system, and launch an interactive session with the container image:
226+
```console
227+
# launch container on compute node
228+
$ srun -N 1 --environment=/absolute/path/to/tomlfile.toml --pty bash
229+
```
230+
231+
Then on the compute node, you can start the tunnel manually, following the prompts to log in via GitHub:
232+
```console
233+
$ cd path/for/code/executable/in/container
234+
$ ./code tunnel --name=$CLUSTER_NAME-tunnel
235+
```
236+
237+
## Connecting via VS Code UI
162238

163239
!!! warning
164-
This approach is not recommended, because while it may be easier to connect via the VS Code UI, it is much more difficult to configure the connection so that you can use uenv, containers or compute nodes.
240+
This approach is not recommended, and is not supported by CSCS.
165241

166-
!!! todo
167-
Write the guide
242+
It is relatively easy to connect to a log in node using the "Connect to Host... (Remote-SSH)" option in the VS Code GUI on your laptop.
243+
However, it is complicated and difficult to configure the connection so that the environment used by the VS Code session is in a uenv/container or on a compute node.

docs/alps/clusters.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
[](){#ref-alps-clusters}
22
# Alps Clusters
33

4-
A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed. It serves as a dedicated environment supporting a specific platform. The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment. Once deployed by CSCS, the vCluster becomes immutable.
4+
A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed.
5+
It serves as a dedicated environment supporting a specific platform.
6+
The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment.
7+
Once deployed by CSCS, the vCluster becomes immutable.
58

69
## Clusters on Alps
710

811
Clusters on Alps are provided as part of different [platforms][ref-alps-platforms].
12+
The following clusters are part of the platforms that are fully operated by CSCS.
913

1014
<div class="grid cards" markdown>
1115
- :fontawesome-solid-mountain: __Machine Learning Platform__

docs/alps/hardware.md

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33

44
Alps is a HPE Cray EX3000 system, a liquid cooled blade-based, high-density system.
55

6-
!!! todo
7-
this is a skeleton - all of the details need to be filled in
6+
!!! under-construction
7+
This page is a work in progress - contact us if you want us to prioritise documentation specific information that would be useful for your work.
88

99
## Alps Cabinets
1010

@@ -40,13 +40,13 @@ Alps was installed in phases, starting with the installation of 1024 AMD Rome du
4040

4141
There are currently five node types in Alps:
4242

43-
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
44-
| ---- | ------- | ------:| -----:| -----------:| -----------:|
45-
| NVIDIA GH200 | gh200 | 1344 | 2688 | 10,752 | 10,752 |
46-
| AMD Rome | zen2 | 256 | 1024 | 2,048 | -- |
47-
| NVIDIA A100 | a100 | 72 | 144 | 144 | 576 |
48-
| AMD MI250x | mi200 | 12 | 24 | 24 | 96 |
49-
| AMD MI300A | mi300 | 64 | 128 | 512 | 512 |
43+
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
44+
| ---- | ------- | ------:| -----:| -----------:| -----------:|
45+
| [NVIDIA GH200][ref-alps-gh200-node] | gh200 | 1344 | 2688 | 10,752 | 10,752 |
46+
| [AMD Rome][ref-alps-zen2-node] | zen2 | 256 | 1024 | 2,048 | -- |
47+
| [NVIDIA A100][ref-alps-a100-node] | a100 | 72 | 144 | 144 | 576 |
48+
| [AMD MI250x][ref-alps-mi200-node] | mi200 | 12 | 24 | 24 | 96 |
49+
| [AMD MI300A][ref-alps-mi300-node] | mi300 | 64 | 128 | 512 | 512 |
5050

5151
[](){#ref-alps-gh200-node}
5252
### NVIDIA GH200 GPU Nodes
@@ -57,6 +57,7 @@ There are currently five node types in Alps:
5757
Please [get in touch](https://github.com/eth-cscs/cscs-docs/issues) if there is information that you want to see here.
5858

5959
There are 24 cabinets, in 4 rows with 6 cabinets per row, and each cabinet contains 112 nodes (for a total of 448 GH200):
60+
6061
* 8 chassis per cabinet
6162
* 7 blades per chassis
6263
* 2 nodes per blade
@@ -80,16 +81,44 @@ Each node contains four Grace-Hopper modules and four corresponding network inte
8081
[](){#ref-alps-zen2-node}
8182
### AMD Rome CPU Nodes
8283

83-
!!! todo
84+
These nodes have two [AMD Epyc 7742](https://en.wikichip.org/wiki/amd/epyc/7742) 64-core CPU sockets, and are used primarily for the [Eiger][ref-cluster-eiger] system. They come in two memory configurations:
85+
86+
* *Standard-memory*: 256 GB in 16x16 GB DDR4 DIMMs.
87+
* *Large-memory*: 512 GB in 16x32 GB DDR4 DIMMs.
88+
89+
!!! note "Not all memory is available"
90+
The total memory available to jobs on the nodes is roughly 245 GB and 497 GB on the standard and large memory nodes respectively.
91+
92+
The amount of memory available to your job also depends on the number of MPI ranks per node -- each MPI rank has a memory overhead.
8493

85-
EX425
94+
A schematic of a *standard memory node* below illustrates the CPU cores and [NUMA nodes](https://www.kernel.org/doc/html/v4.18/vm/numa.html).(1)
95+
{.annotate}
96+
97+
1. Obtained with the command `lstopo --no-caches --no-io --no-legend eiger-topo.png` on Eiger.
98+
99+
![Screenshot](../images/slurm/eiger-topo.png)
100+
101+
* The two sockets are labelled Package L#0 and Package L#1.
102+
* Each socket has 4 NUMA nodes, with 16 cores each, for a total of 64 cores per socket.
103+
104+
Each core supports [simultaneous multi threading (SMT)](https://www.amd.com/en/blogs/2025/simultaneous-multithreading-driving-performance-a.html), whereby each core can execute two threads concurrently, which are presented as two processing units (PU) per physical core:
105+
106+
* the first PU on each core are numbered 0:63 on socket 0, and 64:127 on socket 1;
107+
* the second PU on each core are numbered 128:191 on socket 0, and 192:256 on socket 1;
108+
* hence, core `n` has PUs `n` and `n+128`.
109+
110+
Each node has two Slingshot 11 network interface cards (NICs), which are not illustrated on the diagram.
86111

87112
[](){#ref-alps-a100-node}
88113
### NVIDIA A100 GPU Nodes
89114

90-
!!! todo
115+
The Grizzly Peak blades contain two nodes, where each node has:
91116

92-
Grizzly Peak
117+
* One 64-core Zen3 CPU socket
118+
* 512 GB DDR4 Memory
119+
* 4 NVIDIA A100 GPUs with 80 GB HBM3 memory each
120+
* The MCH system is the same, except the A100 have 96 GB of memory.
121+
* 4 NICs -- one per GPU.
93122

94123
[](){#ref-alps-mi200-node}
95124
### AMD MI250x GPU Nodes

0 commit comments

Comments
 (0)