Skip to content

Commit 187ed20

Browse files
authored
Update daint.md
minor updates
1 parent ade1a7d commit 187ed20

File tree

1 file changed

+15
-14
lines changed

1 file changed

+15
-14
lines changed

docs/clusters/daint.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ Daint is the main [HPC Platform][ref-platform-hpcp] cluster that provides comput
77

88
### Compute nodes
99

10-
Daint consists of around 793 [Grace-Hopper nodes][ref-alps-gh200-node].
10+
Daint consists of around 800-1000 [Grace-Hopper nodes][ref-alps-gh200-node].
1111

12-
The number of nodes can change when nodes are added or removed from other clusters on Alps.
12+
The number of nodes can vary as nodes are added or removed from other clusters on Alps.
1313

14-
There are four login nodes, labelled `daint-ln00[1-4]`.
15-
You will be assigned to one of the four login nodes when you ssh onto the system, from where you can edit files, compile applications and start simulation jobs.
14+
There are four login nodes, `daint-ln00[1-4]`.
15+
You will be assigned to one of the four login nodes when you ssh onto the system, from where you can edit files, compile applications and launch batch jobs.
1616

1717
| node type | number of nodes | total CPU sockets | total GPUs |
1818
|-----------|-----------------| ----------------- | ---------- |
@@ -51,7 +51,7 @@ Please refer to the [uenv documentation][ref-uenv] for detailed information on h
5151

5252
- :fontawesome-solid-layer-group: __Scientific Applications__
5353

54-
Provide the latest versions of scientific applications, tuned for Daint, and the tools required to build your own version of the applications.
54+
Provide the latest versions of scientific applications, tuned for Daint, and the tools required to build your own versions of the applications.
5555

5656
* [CP2K][ref-uenv-cp2k]
5757
* [GROMACS][ref-uenv-gromacs]
@@ -97,30 +97,31 @@ To build images, see the [guide to building container images on Alps][ref-build-
9797

9898
CSCS will continue to support and update uenv and container engine, and users are encouraged to update their workflows to use these methods at the first opportunity.
9999

100-
The CPE is still installed on Daint, however it will recieve no support or updates, and will be replaced with a container in a future update.
100+
The CPE is still installed on Daint, however it will receive no support or updates, and will be replaced with a container in a future update.
101101

102102
## Running jobs on Daint
103103

104104
### Slurm
105105

106-
Santis uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs.
106+
Daint uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor compute-intensive workloads.
107107

108-
There are two [Slurm partitions][ref-slurm-partitions] on the system:
108+
There are four [Slurm partitions][ref-slurm-partitions] on the system:
109109

110110
* the `normal` partition is for all production workloads.
111111
* the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes.
112-
* the `xfer` partition is for [internal data transfer][ref-data-xfer-internal] at CSCS.
112+
* the `xfer` partition is for [internal data transfer][ref-data-xfer-internal].
113+
* the `low` partition is a low-priority partition, which may be enabled for specific projects at specific times.
114+
113115

114-
!!! todo "Timmy: add definition of the low partition"
115116

116117
| name | nodes | max nodes per job | time limit |
117118
| -- | -- | -- | -- |
118-
| `normal` | 994 | - | 24 hours |
119-
| `low` | 994 | 2 | 24 hours |
119+
| `normal` | unlim | - | 24 hours |
120120
| `debug` | 24 | 2 | 30 minutes |
121121
| `xfer` | 2 | 1 | 24 hours |
122+
| `low` | unlim | - | 24 hours |
122123

123-
* nodes in the `normal` and `debug` partitions are not shared
124+
* nodes in the `normal` and `debug` (and `low`) partitions are not shared
124125
* nodes in the `xfer` partition can be shared
125126

126127
See the Slurm documentation for instructions on how to run jobs on the [Grace-Hopper nodes][ref-slurm-gh200].
@@ -136,7 +137,7 @@ Daint can also be accessed using [FirecREST][ref-firecrest] at the `https://api.
136137
### Scheduled maintenance
137138

138139
!!! todo "move this to HPCP top level docs"
139-
Wednesday morning 8-12 CET is reserved for periodic updates, with services potentially unavailable during this timeframe. If the queues must be drained (redeployment of node images, rebooting of compute nodes, etc) then a Slurm reservation will be in place that will prevent jobs from running into the maintenance window.
140+
Wednesday mornings 8:00-12:00 CET are reserved for periodic updates, with services potentially unavailable during this time frame. If the batch queues must be drained (for redeployment of node images, rebooting of compute nodes, etc) then a Slurm reservation will be in place that will prevent jobs from running into the maintenance window.
140141

141142
Exceptional and non-disruptive updates may happen outside this time frame and will be announced to the users mailing list, and on the [CSCS status page](https://status.cscs.ch).
142143

0 commit comments

Comments
 (0)