Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 48 additions & 4 deletions docs/access/ump.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,50 @@
# User Management Portal
[](){#ump}
# Account and Resources Management Tool

!!! todo
copy over docs from [confluence](https://confluence.cscs.ch/display/KB/Account+and+Resources+Management+Tool)
The Swiss National Supercomputing Centre (CSCS) offers a web-based tool for users to manage their accounts and projects at [account.cscs.ch](https://account.cscs.ch).

15 minute job
With this tool, users can:

- Access their profile, manage institutional details, or reset their password.
- List the projects they belong to, including closed ones.
- Check details on each project, quotas, and current utilization.
- Get an overview of where their files are stored at CSCS (including home directories, scratch, etc.).

For group leaders (or PIs), the tool allows:

- Managing user membership and access control.
- Inviting users to their projects via email. Existing users can accept immediately, while new users will receive instructions to create an account and join the project.
- Removing users from their projects.
- Selecting which users can access a system (and submit jobs) and which ones can only access project data.
- Defining one or more deputies to perform such tasks.
**Note:** The responsibility of what happens within the project still belongs to the group leader or PI.

A short guideline on how to perform these tasks is provided below.

## Usage

The tool is designed to be intuitive and comprises the following main areas:

- **A) Account selector**: For users with multiple accounts (e.g., service accounts).
- **B) Profile management**: To view and edit the account's institutional details and change the password.
- **C) Project membership**: To show the selected project in detail.
- **D) Storage**: Where users can see where they have stored their files (home, scratch, and project areas).
- **E) Main view**

![Screenshot](../images/access/ump.png)

### Membership Management (for Group Leaders and Deputies Only)

To invite users to a selected project, group leaders or their deputies need to:

1. Select the project on the left menu.
2. Click the "Members" tab.
3. Scroll down to the "Users" (or "Deputies" to manage deputies) section.
4. Use the "+" (plus) button on the right of the section and enter the given and family names and email address of the invitee.
The invitee will receive instructions on how to join the project. The group leader will get a confirmation on whether the invitee has accepted or rejected the invitation.
If the invitee does not have an account, they will also receive instructions on how to create one, which needs to be verified by CSCS administration staff.

To remove users from a selected project, group leaders or their deputies need to:

1. Repeat steps 1 to 3 above.
2. Use the icon with the three horizontal lines (see screenshot below) that is on the right of the user and select "Remove user."
22 changes: 20 additions & 2 deletions docs/alps/hardware.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[](){#alps-hardware}
# Alps Hardware

Alps is a HPE Cray EX3000 system, a liquid cooled blade-based, high-density system.
Expand All @@ -20,7 +21,19 @@ This approach to cooling provides greater efficiency for the rack-level cooling,
* Maximum of 64 quad-blade compute blades
* Maximum of 64 Slingshot switch blades

## Alps Blades
## Alps High Speed Network

!!! todo
information about the network.

* Details about SlingShot 11.
* how many NICS per node
* raw feeds and speeds
* Some OSU benchmark results.
* GPU-aware communication
* **slingshot is not infiniband - there is no NVSwitch**

## Alps Nodes

Alps was installed in phases, starting with the installation of 1024 AMD Rome dual socket CPU nodes in 2020, through to the main installation of 2,688 Grace-Hopper nodes in 2024.

Expand All @@ -34,26 +47,31 @@ There are currently four node types in Alps, with another becoming available in
| AMD MI250x | 12 | 24 | 24 | 96 |
| AMD MI300A | 64 | 128 | 512 | 512 |

[](){#gh200-node}
### NVIDIA GH200 GPU Nodes
[](){#gh200-hardware-description}

Perry Peak

[](){#zen2-node}
### AMD Rome CPU Nodes

EX425

[](){#a100-node}
### NVIDIA A100 GPU Nodes

Grizzly Peak

[](){#mi200-node}
### AMD MI250x GPU Nodes

Bard Peak

[](){#mi300-node}
### AMD MI300A GPU Nodes

Parry Peak

!!! info "coming soon"
H1 2025

38 changes: 18 additions & 20 deletions docs/alps/index.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,37 @@
# Alps Infrastructure

Alps is a general-purpose compute and data Research Infrastructure (RI) open to the broad community of researchers in Switzerland and the rest of the world. Alps will provide a high impact, challenging and innovative RI that will allow Switzerland to advance science and impact society.
Alps is a general-purpose compute and data Research Infrastructure (RI) open to the broad community of researchers in Switzerland and the rest of the world.
Alps provides a high impact, challenging and innovative RI that will allows Switzerland to advance science and impact society.

Alps enables the creation of versatile clusters (vClusters) that can be tailored to the specific needs of users while maintaining confidentiality. For example, a vCluster will be dedicated to MeteoSwiss’ numerical weather forecasts, another one to the User Lab and another one to Machine Learning and Artificial Intelligence.
Alps enables the creation of versatile clusters (vClusters) that can be tailored to the specific needs of users while maintaining confidentiality.
For example, a vCluster will be dedicated to MeteoSwiss’ numerical weather forecasts, another one to the User Lab and another one to Machine Learning and Artificial Intelligence.

A key feature of Alps is multi-tenancy, where tenants are organizations, typically a research institution, that deploys, operates, or manages its platform on the Alps infrastructure.
Tenants have privileged access to resource nodes, enabling them to deploy their own services and resource configurations.
Additionally, network segregation ensures secure and isolated communication, with the option to connect to the tenant's private network.

<div class="grid cards" markdown>

- :fontawesome-solid-signs-post: __Hardware__
- :fontawesome-solid-signs-post: __Platforms__

Learn about the node types and networking infrastructure in Alps.
[:octicons-arrow-right-24: Alps Platforms][platforms]

[:octicons-arrow-right-24: Alps Hardware](hardware.md)
- :fontawesome-solid-signs-post: __Clusters__

- :fontawesome-solid-signs-post: __Network__
The resources on Alps are partitioned and configured into versatile software defined clusters (vClusters).

Learn about the Slingshot 11 network on Alps.
[:octicons-arrow-right-24: Alps vClusters][clusters]

[:octicons-arrow-right-24: Alps Network](network.md)
- :fontawesome-solid-signs-post: __Hardware__

Learn about the node types and networking infrastructure in Alps.

[:octicons-arrow-right-24: Alps Hardware](hardware.md)

- :fontawesome-solid-signs-post: __Storage__

Learn about the file systems attached to Alps.

[:octicons-arrow-right-24: Alps Storage](storage.md)

- :fontawesome-solid-signs-post: __vClusters__

The resources on Alps are partitioned and configured into versatile software defined clusters (vClusters).

[:octicons-arrow-right-24: Alps vClusters](vclusters.md)

- :fontawesome-solid-signs-post: __Tenants__

Alps is a multi-tenant system.

[:octicons-arrow-right-24: Alps Tenants](tenants.md)

</div>
28 changes: 28 additions & 0 deletions docs/alps/platforms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
[](){#platforms}
# Platforms on Alps

A platform represents a set of scientific services along with compute and data resources hosted on the Alps research infrastructure, provided to a specific scientific community.
Each platform addresses particular research needs and domains, such as climate and weather modeling, machine learning, or high-performance computing applications.
A platform can consist of one or multiple [clusters][clusters], and its services can be managed either by CSCS or by the scientific community itself, including access control, usage policies, and support.

<div class="grid cards" markdown>

- :fontawesome-solid-mountain: __Machine Learning Platform__

The Machine Learning Platform (MLp) hosts ML and AI researchers.

[:octicons-arrow-right-24: MLp][mlp]

- :fontawesome-solid-mountain: __HPC Platform__

!!! todo

[:octicons-arrow-right-24: HPCp][hpcp]

- :fontawesome-solid-mountain: __Climate and Weather Platform__

!!! todo

[:octicons-arrow-right-24: CWp][cwp]

</div>
7 changes: 0 additions & 7 deletions docs/alps/tenants.md

This file was deleted.

36 changes: 30 additions & 6 deletions docs/alps/vclusters.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,34 @@
# Alps vClusters
[](){#clusters}
# Alps Clusters

!!! todo
this page answers the question "what is a vCluster"?
A vCluster (versatile software-defined cluster) is a logical partition of the supercomputing resources where platform services are deployed. It serves as a dedicated environment supporting a specific platform. The composition of resources and services for each vCluster is defined in a configuration file used by an automated pipeline for deployment. Once deployed by CSCS, the vCluster becomes immutable.

* What is a vCluster?
* Examples of vClusters
## Clusters on Alps

Clusters on Alps are provided as part of different [platforms][platforms].

<div class="grid cards" markdown>
- :fontawesome-solid-mountain: __Machine Learning Platform__

Clariden is the main Grace-Hopper cluster

[:octicons-arrow-right-24: Clariden][clariden]

Bristen is a small system with a100 nodes, used for **todo**

[:octicons-arrow-right-24: Bristen][bristen]
</div>

<div class="grid cards" markdown>
- :fontawesome-solid-mountain: __HPC Platform__ { .col-span-12 }

!!! todo
</div>

<div class="grid cards" markdown>
- :fontawesome-solid-mountain: __Climate and Weather Platform__

!!! todo
</div>

We don't document individual vClusters here - these are documented under each platform.

Binary file added docs/images/access/ump.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Welcome to the techincal documentation for Alps.

Once you have a project at CSCS, start here to find your platform:

[:octicons-arrow-right-24: Platforms overview](platforms/index.md)
[:octicons-arrow-right-24: Platforms overview][platforms]

Go straight to the documentation for the platform that hosts your project:

Expand Down
26 changes: 0 additions & 26 deletions docs/platforms/index.md

This file was deleted.

13 changes: 11 additions & 2 deletions docs/platforms/mlp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,17 @@ Once invited to a project, you will receive an email, which you can need to crea

The main cluster provided by the MLp is Clariden, a large Grace-Hopper GPU system on Alps.

!!! todo
introduction paragraph and cards that link to Clariden and Bristen
<div class="grid cards" markdown>
- :fontawesome-solid-mountain: [__Clariden__][clariden]

Clariden is the main [Grace-Hopper][gh200-node] cluster used for **todo**
</div>

<div class="grid cards" markdown>
- :fontawesome-solid-mountain: [__Bristen__][bristen]

Bristen is a smaller system with [A100 GPU nodes][a100-node] for **todo**
</div>

## Guides and Tutorials

Expand Down
2 changes: 1 addition & 1 deletion docs/tools/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following sections will provide detailed guidance on how to use SLURM to req
[using slurm on Grace-Hopper][gh200-slurm]
```

Link to the [Grace-Hopper overview][gh200-hardware-description].
Link to the [Grace-Hopper overview][gh200-node].

An example of using tabs to show srun and sbatch useage to get one GPU per MPI rank:

Expand Down
6 changes: 6 additions & 0 deletions docs/vclusters/bristen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[](){bristen}
# Bristen

!!! todo
use the [clariden][clariden] as template.

32 changes: 14 additions & 18 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,31 +15,27 @@ plugins:
- autorefs
nav:
- Welcome: index.md
- 'Alps':
- alps/index.md
- 'Hardware': alps/hardware.md
- 'Network': alps/network.md
- 'Storage': alps/storage.md
- 'vClusters': alps/vclusters.md
- 'Tenants': alps/tenants.md
- 'Platforms':
- platforms/index.md
- 'HPC Platform':
- platforms/hpcp/index.md
- 'Machine Learning Platform':
- platforms/mlp/index.md
# we could move all vcluster descriptions to a vcluster/name.md
# then link them into the respective platform
- 'clariden': vclusters/clariden.md
- 'Climate and Weather Platform':
- platforms/cwp/index.md
- 'Access':
- access/index.md
- 'MFA':
- access/mfa/index.md
- 'using windows': access/mfa/windows.md
- 'UMP': access/ump.md
- 'Waldur': access/waldur.md
- 'Alps':
- alps/index.md
- 'Platforms': alps/platforms.md
- 'Clusters': alps/vclusters.md
- 'Hardware': alps/hardware.md
- 'Storage': alps/storage.md
- 'Machine Learning Platform':
- platforms/mlp/index.md
- 'clariden': vclusters/clariden.md
- 'bristen': vclusters/bristen.md
- 'HPC Platform':
- platforms/hpcp/index.md
- 'Climate and Weather Platform':
- platforms/cwp/index.md
- 'Tools':
- tools/index.md
- 'slurm': tools/slurm.md
Expand Down