eth-cscs
diff --git a/‎docs/access/vscode.md‎
Lines changed: 111 additions & 35 deletions b/‎docs/access/vscode.md‎
Lines changed: 111 additions & 35 deletions
diff --git a/‎docs/alps/clusters.md‎
Lines changed: 5 additions & 1 deletion b/‎docs/alps/clusters.md‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/alps/hardware.md‎
Lines changed: 42 additions & 13 deletions b/‎docs/alps/hardware.md‎
Lines changed: 42 additions & 13 deletions
@@ -1,22 +1,34 @@
 [](){#ref-access-vscode}
-# Connecting with VSCode
+# Connecting with VS Code
 
 [Visual Studio Code](https://code.visualstudio.com/) provides flexible support for remote development.
-VSCode's [remote tunnel feature](https://code.visualstudio.com/docs/remote/tunnels) starts a server on a remote system, and connects the editor to this server.
+VS Code's [remote tunnel feature](https://code.visualstudio.com/docs/remote/tunnels) starts a server on a remote system, and connects the editor to this server.
 There are two ways to set up the connection:
 
 * using the code CLI: the most flexible method if using containers or uenv.
-* using the VSCode interface: VSCode will connect onto the system, download and start the server
+* using the VS Code interface: VS Code will connect onto the system, download and start the server
 
-The main challenge with using VSCode is that the most convenient method for starting a remote session is to start a remote tunnel from the VS Code GUI.
+The main challenge with using VS Code is that the most convenient method for starting a remote session is to start a remote tunnel from the VS Code GUI.
 This approach starts a session in the standard login environment on that node, however this won't work if you want to be developing in a container, in a uenv, or on a compute node.
 
+This process is also demonstrated in a webinar on [Interactive computing on "Alps"](https://www.cscs.ch/publications/tutorials/2025/video-of-the-webinar-interactive-computing-on-alps):
+
+<iframe width="100%"
+        height="315"
+        src="https://www.youtube.com/embed/cLVpJO_fE6I?si=bTmmsS_9QvTHpUqK&amp;start=2257"
+        title="YouTube video player"
+        frameborder="0"
+        allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+        referrerpolicy="strict-origin-when-cross-origin"
+        allowfullscreen>
+</iframe>
+
 ## Flexible method: remote server
 
-The most flexible method for connecting VSCode is to log in to the Alps system, set up your environment (start a container or uenv, start a session on a compute node), and start the remote server in that environment pre-configured.
+The most flexible method for connecting VS Code is to log in to the Alps system, set up your environment (start a container or uenv, start a session on a compute node), and start the remote server in that environment pre-configured.
 
-!!! note
-    This approach requires that you have a GitHub account, and that the GitHub account is configured with your VS Code editor.
+[](){#ref-vscode-install}
+### Installing the server
 
 The first step is to download the VS Code CLI tool `code`, which CSCS provides for easy download.
 There are two executables, one for using on systems with x86 or ARM CPUs respectively.
@@ -33,16 +45,15 @@ There are two executables, one for using on systems with x86 or ARM CPUs respect
     tar -xf vscode_cli_alpine_x64_cli.tar.gz
     ```
 
-!!! note
-    See the guide on how to manage [architecture-specific binaries][ref-guides-terminal-arch] if you plan to use VScode on both x86 and ARM clusters.
+After downloading, copy the `code` executable to a location in your PATH, so that it is available for future sessions.
 
-Alternatively, download the CLI tool from the [VS Code site](https://code.visualstudio.com/Download) -- take care to select either x86 or Arm64 version that matches the target system.
+Clusters on Alps share a common [home][ref-storage-home] path `HOME=/users/$USER` that is mounted on all clusters.
 
-After downloading, copy the `code` executable to a location in your PATH, so that it is available for future sessions.
+If you want to use VS Code on multiple clusters, possibly with different CPU architectures (Daint, Clariden and Santis use `aarch64` CPUs, and [Eiger][ref-cluster-eiger] uses `x86_64` CPUs), you need to take some additional steps to ensure that VS Code installation and configuration is separated.
 
-??? note "guidance on where to put architecture-specific executables"
-    The home directory can be shared by multiple clusters that might have different micro-architectures, so it is important to separate executables for x86 and aarch64 (ARM) targets.
+First, install the `code` executable in an [architecture-specific path][ref-guides-terminal-arch].
 
+!!! example "Installing VS Code for `x86_64` and `aarch64`"
     In `~/.bashrc`, add the following line (you will need to log in again for this to take effect):
     ```
     export PATH=$HOME/.local/$(uname -m)/bin:$PATH
@@ -54,27 +65,65 @@ After downloading, copy the `code` executable to a location in your PATH, so tha
     mkdir -p $HOME/.local/$(uname -m)/bin
     cp ./code $HOME/.local/$(uname -m)/bin
     ```
+    Repeat this for both `x86_64` and `aarch64` binaries.
+
+By default VS Code will store configuration, data and executables in `$HOME/.vscode-server`.
+To use VS Code on multiple clusters, it is strongly recommended that you create separate `vscode-server` path for each cluster
+by adding the following environment variable definitions to your `~/.bashrc`:
+
+```bash
+export VSCODE_AGENT_FOLDER="$HOME/.vscode-server/$CLUSTER_NAME-tunnel/.vscode-server"
+export VSCODE_CLI_DATA_DIR="$VSCODE_AGENT_FOLDER/cli"
+```
+
+!!! warning
+    You will need to log out and back in after updating `$HOME/.bashrc`, before trying to start the VS Code server for the first time.
+
+[](){#ref-vscode-update}
+### Updating VS Code server
+
+VS Code is continuously being updated, and the version of VS Code on your laptop will most likely be more recent than the version provided by CSCS.
+
+Once you have installed the server, you can easily update it to the latest version:
+
+```console title="Updating VS Code server"
+$ code --version
+code 1.97.2 (commit e54c774e0add60467559eb0d1e229c6452cf8447)
+$ code update
+Successfully updated to 1.101.0 (commit dfaf44141ea9deb3b4096f7cd6d24e00c147a4b1)
+$ code --version
+code 1.101.0 (commit dfaf44141ea9deb3b4096f7cd6d24e00c147a4b1)
+```
+
+It is good practice to periodically update code to keep it in sync with the version on your laptop.
+
+[](){#ref-vscode-starting}
+### Starting and configuring the server
+
+!!! note
+    You need to have a GitHub account to connect a remote tunnel to VS Code.
 
 To set up a remote server on the target system,
-run the `code` executable that you downloaded the `tunnel` argument.
+run the `code` executable that you downloaded with the `tunnel` argument.
 You will be asked to choose whether to log in to Microsoft or GitHub (we have tested with GitHub):
 
-```
-> code tunnel --name=$CLUSTER_NAME-tunnel
+```console
+$ code tunnel --name=$CLUSTER_NAME-tunnel
 ...
 ? How would you like to log in to Visual Studio Code? ›
   Microsoft Account
 ❯ GitHub Account
 ```
 
 !!! tip
-    Give the tunnel a unique name using the `--name` flag, which will later be listed on the VSCode UI.
+    Give the tunnel a unique name using the `--name` flag, which will later be listed on the VS Code UI.
 
 You will be requested to go to [github.com/login/device](https://github.com/login/device) and enter an 8-digit code.
-Once you have finished registering the service with GitHub, in VSCode on your PC/laptop open the "remote explorer" pane on the left hand side of the main window, and the connection will be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
+Once you have finished registering the service with GitHub, in VS Code on your PC/laptop open the "remote explorer" pane on the left hand side of the main window, and the connection will be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
+
+!!! note "First time setting up a remote service"
+    If this is the first time you have followed this procedure, you may have to sign in to GitHub in VS Code.
 
-!!! note "first time setting up a remote service"
-    If this is the first time you have followed this procedure, you may have to sign in to GitHub in VSCode.
     Click on the Remote Explorer button on the left hand side, and then find the following option:
 
     ```
@@ -85,11 +134,12 @@ Once you have finished registering the service with GitHub, in VSCode on your PC
 
     If you have not signed in to GitHub with VS Code editor, you will be redirected to the browser to sign in.
 
-    After signing in and authorizing VSCode, the open tunnel should be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
+    After signing in and authorizing VS Code, the open tunnel should be visible under REMOTES (TUNNELS/SSH) -> Tunnels.
 
+[](){#ref-vscode-uenv}
 ### Using with uenv
 
-To use a uenv with VSCode, the uenv must be started before calling `code tunnel`.
+To use a uenv with VS Code, the uenv must be started before calling `code tunnel`.
 Log into the target system and start the uenv, then start the remote server, for example:
 ```
 # log into daint (this could be any other Alps cluster)
@@ -106,20 +156,16 @@ ssh daint
 uenv run --view=default prgenv-gnu/24.11:v1 -- code tunnel --name=$CLUSTER_NAME-tunnel
 ```
 
-Once the tunnel is configured, you can access it from VSCode.
+Once the tunnel is configured, you can access it from VS Code.
 
 !!! warning
     If you plan to do any intensive work: repeated compilation of large projects or running python code in Jupyter, please see the guide to running on a compute node below.
     Running intensive workloads on login nodes, which are shared resources between all users, is against CSCS [fair usage][ref-policies-fair-use] of Shared Resources policy.
 
-### Using with containers
-
-!!! todo
-    write a guide
-
+[](){#ref-vscode-compute-nodes}
 ### Running on a compute node
 
-If you plan to do computation using your VSCode, then you should first allocate resources on a compute node and set up your environment there.
+If you plan to do computation using your VS Code, then you should first allocate resources on a compute node and set up your environment there.
 
 !!! example "directly create the tunnel using srun"
     You can directly execute the `code tunnel` command using srun:
@@ -130,7 +176,7 @@ If you plan to do computation using your VSCode, then you should first allocate
 
     * `--uenv` and `--view` set up the uenv
     * `-t120` requests a 2 hour (120 minute) reservation
-    * `-n1` requests a single rank - only one rank/process is required for VSCode
+    * `-n1` requests a single rank - only one rank/process is required for VS Code
     * `--pty` allows forwarding of terminal I/O, required to sign in to Github
 
     Once the job allocation is granted, you will be prompted to log into GitHub, the same as starting a session on the login node.
@@ -155,13 +201,43 @@ If you plan to do computation using your VSCode, then you should first allocate
     ```
 
     * `-t120` requests a 2 hour (120 minute) reservation
-    * `-n1` requests a single rank - only one rank/process is required for VSCode
+    * `-n1` requests a single rank - only one rank/process is required for VS Code
     * `--pty` allows forwarding of terminal I/O, for bash to work interactively
 
-## Connecting via VSCode UI
+[](){#ref-vscode-containers}
+### Using with containers
+
+This will use CSCS's [Container Engine][ref-container-engine], to launch the container on a compute node and start the VS Code server.
+
+```toml title="EDF file with image and mount paths"
+image = "nvcr.io#nvidia/pytorch:24.01-py3" # example of PyTorch NGC image
+writable = true
+mounts = ["/paths/on/scratch/or/home:path/on/the/container",
+          "/path/if/same/on/both"
+          "/path/of/code/executable:/path/for/code/executable/in/container"]
+workdir = "default/working/dir/path"
+```
+
+!!! note
+    Ensure that the `code` executable is accessible in the container.
+    It can either be contained in the image, or you can [install][ref-vscode-install] and [update][ref-vscode-update] the server in a path that you [mount][ref-ce-edf-reference-mounts] inside the container in the `mounts` field of the EDF file.
+
+Log into the target system, and launch an interactive session with the container image:
+```console
+# launch container on compute node
+$ srun -N 1 --environment=/absolute/path/to/tomlfile.toml --pty bash
+```
+
+Then on the compute node, you can start the tunnel manually, following the prompts to log in via GitHub:
+```console
+$ cd path/for/code/executable/in/container
+$ ./code tunnel --name=$CLUSTER_NAME-tunnel
+```
+
+## Connecting via VS Code UI
 
 !!! warning
-    This approach is not recommended, because while it may be easier to connect via the VS Code UI, it is much more difficult to configure the connection so that you can use uenv, containers or compute nodes.
+    This approach is not recommended, and is not supported by CSCS.
 
-!!! todo
-    Write the guide
+    It is relatively easy to connect to a log in node using the "Connect to Host... (Remote-SSH)" option in the VS Code GUI on your laptop.
+    However, it is complicated and difficult to configure the connection so that the environment used by the VS Code session is in a uenv/container or on a compute node.
@@ -1,11 +1,15 @@
 [](){#ref-alps-clusters}
 # Alps Clusters
 
-A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed. It serves as a dedicated environment supporting a specific platform. The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment. Once deployed by CSCS, the vCluster becomes immutable.
+A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed.
+It serves as a dedicated environment supporting a specific platform.
+The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment.
+Once deployed by CSCS, the vCluster becomes immutable.
 
 ## Clusters on Alps
 
 Clusters on Alps are provided as part of different [platforms][ref-alps-platforms].
+The following clusters are part of the platforms that are fully operated by CSCS.
 
 <div class="grid cards" markdown>
 -   :fontawesome-solid-mountain: __Machine Learning Platform__
 
@@ -3,8 +3,8 @@
 
 Alps is a HPE Cray EX3000 system, a liquid cooled blade-based, high-density system.
 
-!!! todo
-    this is a skeleton - all of the details need to be filled in
+!!! under-construction
+    This page is a work in progress - contact us if you want us to prioritise documentation specific information that would be useful for your work.
 
 ## Alps Cabinets
 
@@ -40,13 +40,13 @@ Alps was installed in phases, starting with the installation of 1024 AMD Rome du
 
 There are currently five node types in Alps:
 
-| type           | abbreviation  | blades | nodes | CPU sockets | GPU devices |
-| ----           | -------       | ------:| -----:| -----------:| -----------:|
-| NVIDIA GH200   | gh200         | 1344   | 2688  | 10,752      | 10,752      |
-| AMD Rome       | zen2          |  256   | 1024  |  2,048      | --          |
-| NVIDIA A100    | a100          |   72   |  144  |    144      | 576         |
-| AMD MI250x     | mi200         |   12   |   24  |     24      |  96         |
-| AMD MI300A     | mi300         |   64   |  128  |    512      | 512         |
+| type                                | abbreviation  | blades | nodes | CPU sockets | GPU devices |
+| ----                                | -------       | ------:| -----:| -----------:| -----------:|
+| [NVIDIA GH200][ref-alps-gh200-node] | gh200         | 1344   | 2688  | 10,752      | 10,752      |
+| [AMD Rome][ref-alps-zen2-node]      | zen2          |  256   | 1024  |  2,048      | --          |
+| [NVIDIA A100][ref-alps-a100-node]   | a100          |   72   |  144  |    144      | 576         |
+| [AMD MI250x][ref-alps-mi200-node]   | mi200         |   12   |   24  |     24      |  96         |
+| [AMD MI300A][ref-alps-mi300-node]   | mi300         |   64   |  128  |    512      | 512         |
 
 [](){#ref-alps-gh200-node}
 ### NVIDIA GH200 GPU Nodes
@@ -57,6 +57,7 @@ There are currently five node types in Alps:
     Please [get in touch](https://github.com/eth-cscs/cscs-docs/issues) if there is information that you want to see here.
 
 There are 24 cabinets, in 4 rows with 6 cabinets per row, and each cabinet contains 112 nodes (for a total of 448 GH200):
+
 * 8 chassis per cabinet
 * 7 blades per chassis
 * 2 nodes per blade
@@ -80,16 +81,44 @@ Each node contains four Grace-Hopper modules and four corresponding network inte
 [](){#ref-alps-zen2-node}
 ### AMD Rome CPU Nodes
 
-!!! todo
+These nodes have two [AMD Epyc 7742](https://en.wikichip.org/wiki/amd/epyc/7742) 64-core CPU sockets, and are used primarily for the [Eiger][ref-cluster-eiger] system. They come in two memory configurations:
+
+* *Standard-memory*:  256 GB in 16x16 GB DDR4 DIMMs.
+* *Large-memory*:  512 GB in 16x32 GB DDR4 DIMMs.
+
+!!! note "Not all memory is available"
+    The total memory available to jobs on the nodes is roughly 245 GB and 497 GB on the standard and large memory nodes respectively.
+
+    The amount of memory available to your job also depends on the number of MPI ranks per node -- each MPI rank has a memory overhead.
 
-EX425
+A schematic of a *standard memory node* below illustrates the CPU cores and [NUMA nodes](https://www.kernel.org/doc/html/v4.18/vm/numa.html).(1)
+{.annotate}
+
+1. Obtained with the command `lstopo --no-caches --no-io --no-legend eiger-topo.png` on Eiger.
+
+![Screenshot](../images/slurm/eiger-topo.png)
+
+* The two sockets are labelled Package L#0 and Package L#1.
+* Each socket has 4 NUMA nodes, with 16 cores each, for a total of 64 cores per socket.
+
+Each core supports [simultaneous multi threading (SMT)](https://www.amd.com/en/blogs/2025/simultaneous-multithreading-driving-performance-a.html), whereby each core can execute two threads concurrently, which are presented as two processing units (PU) per physical core:
+
+* the first PU on each core are numbered 0:63 on socket 0, and 64:127 on socket 1;
+* the second PU on each core are numbered 128:191 on socket 0, and 192:256 on socket 1;
+* hence, core `n` has PUs `n` and `n+128`.
+
+Each node has two Slingshot 11 network interface cards (NICs), which are not illustrated on the diagram.
 
 [](){#ref-alps-a100-node}
 ### NVIDIA A100 GPU Nodes
 
-!!! todo
+The Grizzly Peak blades contain two nodes, where each node has:
 
-Grizzly Peak
+* One 64-core Zen3 CPU socket
+* 512 GB DDR4 Memory
+* 4 NVIDIA A100 GPUs with 80 GB HBM3 memory each
+    * The MCH system is the same, except the A100 have 96 GB of memory.
+* 4 NICs -- one per GPU.
 
 [](){#ref-alps-mi200-node}
 ### AMD MI250x GPU Nodes