You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/gb2025.md
+17-26Lines changed: 17 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
[](){#ref-gb2025}
2
2
# Gordon Bell and HPL runs 2025
3
3
4
-
For Gordon Bell and HPL runs in March-April 2025, CSCS has created a reservation on Santis with 1333 nodes (12 cabinets).
4
+
For Gordon Bell and HPL runs in March-April 2025, CSCS has expanded Santis to 1333 nodes (12 cabinets).
5
5
6
6
For the runs, CSCS has applied some updates and changes that aim to improve performance and scaling scale, particularly for NCCL.
7
7
If you are already familiar with running on Daint, you might have to make some small changes to your current job scripts and parameters, which will be documented here.
@@ -27,6 +27,18 @@ Host santis
27
27
28
28
The `normal` partition is used with no reservation, which means that that jobs can be submittied without `--partition` and `--reservation` flags.
29
29
30
+
Timeline:
31
+
32
+
1. Friday 4th April:
33
+
* HPE finish HPL runs at 10:30am
34
+
* CSCS performs testing on the reconfigured system for ~1 hour on the `GB_TESTING_2` reservation
35
+
* The reservation is removed and all GB teams have access to test and tune applications.
36
+
2. Monday 7th April:
37
+
* at 4pm the runs will start for the first team
38
+
39
+
!!! note
40
+
There will be no special reservation during the open testing and tuning between Friday and Monday.
41
+
30
42
### Storage
31
43
32
44
Your data sets from Daint are available on Santis
@@ -37,51 +49,30 @@ Your data sets from Daint are available on Santis
37
49
38
50
## Low Noise Mode
39
51
40
-
Low noise mode (LNM) is now enabled.
41
-
This confines system processes and operations to the first core of each of the four NUMA regions in a node (i.e., cores 0, 72, 144, 216).
42
-
43
-
The consequence of this setting is that only 71 cores per socket can be requested by an application (for a total of 284 cores instead of 288 cores per node).
52
+
!!! note
53
+
Low noise mode has been disabled, so the previous requirement that you set `OMP_PLACES` and `OMP_PROC_BIND` no longer applies.
44
54
45
55
!!! warning "Unable to allocate resources: Requested node configuration is not available"
46
56
If you try to use all 72 cores on each socket, SLURM will give a hard error, because only 71 are available:
0 commit comments