Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c68ae7e
add notes from call with pasmarco
bcumming Apr 23, 2025
ea85084
Merge branch 'main' into storage-refactor
bcumming Apr 25, 2025
5f97cd3
drafting the storage docs
bcumming Apr 28, 2025
07b1199
wip
bcumming May 22, 2025
e6dcf0d
Merge branch 'main' into storage-refactor
bcumming May 22, 2025
0792aa6
Merge branch 'main' into storage-refactor
bcumming May 23, 2025
a3445eb
wip
bcumming May 23, 2025
212967e
Merge branch 'main' into storage-refactor
bcumming May 23, 2025
d4790bc
sweep and mark remaining todo and under-construction sections in stor…
bcumming May 26, 2025
62e4681
wip
bcumming May 26, 2025
994bf38
spell check; add placeholders for FAQ docs
bcumming May 26, 2025
eab4402
Merge branch 'main' into storage-refactor
bcumming May 26, 2025
1680b23
fix broken link
bcumming May 26, 2025
2451b71
add marco p to codeowners
bcumming May 26, 2025
944a640
Merge branch 'main' into storage-refactor
bcumming May 26, 2025
f83fe00
document store layout
bcumming May 26, 2025
bb53f77
Merge branch 'storage-refactor' of github.com:bcumming/cscs-docs into…
bcumming May 26, 2025
0c60849
Update docs/alps/storage.md
twrobinson May 27, 2025
02a8c1c
Update docs/alps/storage.md
bcumming May 27, 2025
405f8c6
Update docs/alps/storage.md
bcumming May 27, 2025
8e8eff0
Update docs/storage/filesystems.md
bcumming May 27, 2025
bf02e58
@msimber review suggestions
bcumming May 27, 2025
8cebb2b
@RMeli's review
bcumming May 28, 2025
dc9ad3d
@afink review comments
bcumming May 28, 2025
0cd0941
wip
bcumming May 28, 2025
2e56dca
warn against touching files to avoid clean up
bcumming May 28, 2025
c25a4b2
merge main
bcumming May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ docs/software/prgenv/linalg.md @finkandreas @msimberg
docs/software/sciapps/cp2k.md @abussy @RMeli
docs/software/sciapps/gromacs.md @kanduri
docs/software/ml @boeschf
docs/storage @mpasserini
docs/alps/storage.md @mpasserini
63 changes: 59 additions & 4 deletions docs/alps/storage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[](){#ref-alps-storage}
# Alps Storage

!!! under-construction

Alps has different storage attached, each with characteristics suited to different workloads and use cases.
HPC storage is managed in a separate cluster of nodes that host servers that manage the storage and the physical storage drives.
These separate clusters are on the same Slingshot 11 network as the Alps.
Expand All @@ -16,20 +18,73 @@ These separate clusters are on the same Slingshot 11 network as the Alps.
| IOPs | 1.5M | 8.6M read, 24M write | 200k read, 768k write |
| file create/s| 374k | 214k | 97k |


!!! todo
Information about LUSTER. Meta data servers, etc.

* how many meta data servcers on capstor and iopstor
* how these are distributed between store/scratch

Also discuss how capstor and iopstor are used to provide both scratch / store / other file systems

[](){#ref-alps-capstor}
## capstor
## Capstor

Capstor is the largest file system, for storing large amounts of input and output data.
It is used to provide SCRATCH and STORE for different clusters - the precise details are platform-specific.
It is used to provide [scratch][ref-storage-scratch] and [store][ref-storage-store].

!!! todo "add information about meta data services, and their distribution over scratch and store"

[](){#ref-alps-capstor-scratch}
### Scratch

All users on Alps get their own scratch path on Alps, `/capstor/scratch/cscs/$USER`.

[](){#ref-alps-capstor-store}
### Store

The Store mount point on Capstor provides stable storage with [backups][ref-storage-backups] and no [cleaning policy][ref-storage-cleanup].
It is mounted on clusters at the `/capstor/store` mount point, with folders created for each project.

To accomodate the different customers and projects on Alps, the directory structure is more complicated than the per-user paths on Scratch.
Project paths are organised as follows:

```
/capstor/store/<tenant>/<customer>/<group_id>
```

!!! question "What are `tenant`, `customer` and `group_id` in this context?"

* **`tenant`**: there are currently two tenants, `cscs` and `mch`:
* the vast majority of projects are hosted by the `cscs` tenant.
* **`customer`**: refers to the contractual partner responsible for the project.
Examples of customers include:
* `userlab`: projects allocated in the CSCS User Lab through open calls. The majority of projects are hosted here, particularly on the [HPC platform][ref-platform-hpcp].
* `swissai`: most projects allocated on the [Machine Learning Platform][ref-platform-mlp].
* `2go`: projects allocated under the [CSCS2GO](https://2go.cscs.ch) scheme.
* **`group_id`**: refers to the linux group created for the project.

Users often are part of multiple projects, and by extension their associated `groupd_id` groups.
You can get a list of your groups using the `id` command in the terminal:
```console
$ id $USER
uid=22008(bobsmith) gid=32819(g152) groups=32819(g152),33119(g174),32336(vasp6)
```
Here the user `bobsmith` is in three projects, with the project `g152` being their **primary project** (which can also be determined using the `id -gn $USER`).

* They are also in the `vasp6` group, which users who have been granted access to the [VASP][ref-uenv-vasp] application.

!!! info "The `$PROJECT` environment variable"
On some clusters, for example [Eiger][ref-cluster-eiger] and [Eiger][ref-cluster-daint], the project folder for your primary project can be accessed using the `$PROJECT` environment variable.

[](){#ref-alps-iopsstor}
## iopsstor
## Iopsstor

!!! todo
small text explaining what iopsstor is designed to be used for.

[](){#ref-alps-vast}
## vast
## Vast

The Vast storage is smaller capacity system that is designed for use as home folders.

Expand Down
2 changes: 1 addition & 1 deletion docs/services/jupyterlab.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The service is accessed at [jupyter-daint.cscs.ch](https://jupyter-daint.cscs.c

Once logged in, you will be redirected to the JupyterHub Spawner Options form, where typical job configuration options can be selected in order to allocate resources. These options might include the type and number of compute nodes, the wall time limit, and your project account.

Single-node notebooks are launched in a dedicated queue, minimizing queueing time. For these notebooks, servers should be up and running within a few minutes. The maximum waiting time for a server to be running is 5 minutes, after which the job will be cancelled and you will be redirected back to the spawner options page. If your single-node server is not spawned within 5 minutes we encourage you to [contact us](ref-get-in-touch).
Single-node notebooks are launched in a dedicated queue, minimizing queueing time. For these notebooks, servers should be up and running within a few minutes. The maximum waiting time for a server to be running is 5 minutes, after which the job will be cancelled and you will be redirected back to the spawner options page. If your single-node server is not spawned within 5 minutes we encourage you to [contact us][ref-get-in-touch].

When resources are granted the page redirects to the JupyterLab session, where you can browse, open and execute notebooks on the compute nodes. A new notebook with a Python 3 kernel can be created with the menu `new` and then `Python 3` . Under `new` it is also possible to create new text files and folders, as well as to open a terminal session on the allocated compute node.

Expand Down
Loading