Skip to content

Commit f83fe00

Browse files
committed
document store layout
1 parent 2451b71 commit f83fe00

File tree

2 files changed

+61
-16
lines changed

2 files changed

+61
-16
lines changed

docs/alps/storage.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,51 @@ These separate clusters are on the same Slingshot 11 network as the Alps.
3131
## Capstor
3232

3333
Capstor is the largest file system, for storing large amounts of input and output data.
34-
It is used to provide SCRATCH and STORE for different clusters - the precise details are platform-specific.
34+
It is used to provide [scratch][ref-storage-scratch] and [store][ref-storage-store].
35+
36+
!!! todo "add information about meta data services, and their distribution over scratch and store"
37+
38+
[](){#ref-alps-capstor-scratch}
39+
### Scratch
40+
41+
All users on Alps get their own scratch path on Alps, `/capstor/scratch/cscs/$USER`.
42+
43+
[](){#ref-alps-capstor-store}
44+
### Store
45+
46+
The Store mount point on Capstor provides stable storage with [backups][ref-storage-backups] and no [cleaning policy][ref-storage-cleanup].
47+
It is mounted on clusters at the `/capstor/store` mount point, with folders created for each project.
48+
49+
To accomodate the different customers and projects on Alps, the directory structure is more complicated than the per-user paths on Scratch.
50+
Project paths are organised as follows:
51+
52+
```
53+
/capstor/store/<tenant>/<customer>/<group_id>
54+
```
55+
56+
!!! question "What are `tenant`, `customer` and `group_id` in this context?"
57+
58+
* **`tenant`**: there are currently two tenants, `cscs` and `mch`:
59+
* the vast majority of projects are hosted by the `cscs` tenant.
60+
* **`customer`**: refers to the contractual partner responsible for the project.
61+
Examples of customers include:
62+
* `userlab`: projects allocated in the CSCS User Lab through open calls. The majority of projects are hosted here, particularly on the [HPC platform][ref-platform-hpcp].
63+
* `swissai`: most projects allocated on the [Machine Learning Platform][ref-platform-mlp].
64+
* `2go`: projects allocated under the [CSCS2GO](https://2go.cscs.ch) scheme.
65+
* **`group_id`**: refers to the linux group created for the project.
66+
67+
Users often are part of multiple projects, and by extension their associated `groupd_id` groups.
68+
You can get a list of your groups using the `id` command in the terminal:
69+
```console
70+
$ id $USER
71+
uid=22008(bobsmith) gid=32819(g152) groups=32819(g152),33119(g174),32336(vasp6)
72+
```
73+
Here the user `bobsmith` is in three projects, with the project `g152` being their **primary project** (which can also be determined using the `id -gn $USER`).
74+
75+
* They are also in the `vasp6` group, which users who have been granted access to the [VASP][ref-uenv-vasp] application.
76+
77+
!!! info "The `$PROJECT` environment variable"
78+
On some clusters, for example [Eiger][ref-cluster-eiger] and [Eiger][ref-cluster-daint], the project folder for your primary project can be accessed using the `$PROJECT` environment variable.
3579

3680
[](){#ref-alps-iopsstor}
3781
## Iopsstor

docs/storage/filesystems.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,12 @@ Daily [snapshots][ref-storage-snapshots] for the last seven days are provided in
8282
[](){#ref-storage-scratch}
8383
## Scratch
8484

85-
The scratch file system is a fast workspace with temporary storage for use by jobs, with and emphasis on performance over reliability.
86-
All CSCS systems provide a scratch personal folder for users that can be accessed through the environment variable `$SCRATCH`.
85+
The scratch file system is a fast workspace tuned for use by parallel jobs, with and emphasis on performance over reliability, hosted on the [Capstor][ref-alps-capstor] Lustre filesystem.
8786

88-
!!! info "`$SCRATCH` on MLP points to Iopsstore"
89-
All users on Alps get their own Scratch path, `/capstor/scratch/cscs/$USER`, which is pointed to by the variable `$SCRATCH` on the [HPC Platform][ref-platform-hpcp] and [Climate and Weather Platform][ref-platform-cwp] clusters Eiger, Daint and Santis.
87+
All users on Alps get their own scratch path, `/capstor/scratch/cscs/$USER`, which is pointed to by the variable `$SCRATCH` on the [HPC Platform][ref-platform-hpcp] and [Climate and Weather Platform][ref-platform-cwp] clusters Eiger, Daint and Santis.
9088

91-
On the MLP systems [clariden][ref-cluster-clariden] and [bristen][ref-cluster-bristen] the `$SCRATCH` variable points to storage on [Iopstore][ref-alps-iopsstor].
89+
!!! info "`$SCRATCH` on MLP points to Iopsstore"
90+
On the machine learning platform (MLP) systems [clariden][ref-cluster-clariden] and [bristen][ref-cluster-bristen] the `$SCRATCH` variable points to storage on [Iopstore][ref-alps-iopsstor].
9291
See the [MLP docs][ref-mlp-storage] for more information.
9392

9493
### Cleanup and Expiration
@@ -127,21 +126,21 @@ Space on Store is allocated per-project, with a path created for each project:
127126
* the capacity and inode limit is per-project, based on the initial resource request;
128127
* users have read and write access to the store paths for each project that they are a member of.
129128

129+
!!! info
130+
More information about how per-project paths are organised on store is available on the [Capstor][ref-alps-capstor-store] documentation.
131+
130132
!!! warning "Avoid using store for jobs"
131133
Store is tuned for storing results and shared datasets, specifically it has fewer meta data servers assigned to it.
132134

133135
Use the Scratch file systems, which are tuned for fast parallel I/O, for storing input and output for jobs.
134136

135-
!!! todo
136-
Low level information about `/capstor/store/cscs/<customer>/<group_id>` from [KB](https://confluence.cscs.ch/spaces/KB/pages/879142656/capstor+store) can be put into a folded admonition.
137-
138137
### Cleanup and Expiration
139138

140-
There is no [cleanup policy][ref-storage-cleanup] on store, and the contents of are retained for three months after the project ends.
139+
There is no [cleanup policy][ref-storage-cleanup] on store, and the contents are retained for three months after the project ends.
141140

142141
### Quota
143142

144-
Space on Store is allocated per-project, with a path is created for each project:
143+
Space on Store is allocated per-project, with a path created for each project:
145144

146145
* the capacity and inode limit is per-project, based on the initial resource request;
147146
* users have read and write access to the store paths for each project that they are a member of.
@@ -156,7 +155,7 @@ Space on Store is allocated per-project, with a path is created for each project
156155
[](){#ref-storage-quota}
157156
## Quota
158157

159-
Storage quota is a limit on available storage, that is applied to:
158+
Storage quota is a limit on available storage applied to:
160159

161160
* **capacity**: the total size of files;
162161
* and **inodes**: the total number of files and directories.
@@ -219,7 +218,7 @@ Usage data updated on: 2025-05-21 11:10:02
219218
+------------------------------------+--------+--------+------+---------+--------+------+-------------+----------+------+----------+-----------+------+-------------+
220219
```
221220

222-
The available capacity and used capacity is show for each file system that you have access to.
221+
The available capacity and used capacity is shown for each file system that you have access to.
223222
If you are in multiple projects, information for the [store][ref-storage-store] path for each project that you are a member of will be shown.
224223
In the example above, the user is in two projects, namely `g33` and `csstaff`.
225224

@@ -275,7 +274,7 @@ A snapshot is a full copy of a file system at a certain point in time, that can
275274
## Cleanup policies
276275

277276
The performance of Lustre file systems is affected by file system occupancy and the number of files.
278-
Ideally occupancy should not exceed 60%, with severe performance degradation for all users when occupancy exceeds 80% or there are too many small files.
277+
Ideally occupancy should not exceed 60%, with severe performance degradation for all users when occupancy exceeds 80% and when there are too many small files.
279278

280279
File cleanup removes files that are not being used to ensure that occupancy and file counts do not affect file system performance.
281280

@@ -305,8 +304,10 @@ In addition to the automatic deletion of old files, if occupancy exceeds 60% the
305304
??? question "My files are gone, but the directories are still there"
306305
When the [cleanup policy][ref-storage-cleanup] is applied on LUSTRE file systems, the files are removed, but the directories remain.
307306

308-
!!! todo
309-
FAQ question: [why did I run out of space](https://confluence.cscs.ch/spaces/KB/pages/278036496/Why+did+I+run+out+of+space+on+HOME)
307+
??? question "What do messages like `mkdir: cannot create directory 'test': Disk quota exceeded` mean?"
308+
You have run out of quota on the target file system.
309+
Consider deleting unneeded files, or moving data to a different file system.
310+
Specifcially, if you see this message when using [home][ref-storage-home], which has a relatively small 50 GB limit, consider moving the data to your project's [store][ref-storage-store] path.
310311

311312
!!! todo
312313
FAQ question: [writing with specific group access](https://confluence.cscs.ch/spaces/KB/pages/276955350/Writing+on+project+if+you+belong+to+more+than+one+group)

0 commit comments

Comments
 (0)