|
| 1 | +# Gordon Bell 2025 |
| 2 | + |
| 3 | +This is temporary documentation for the Gordon Bell second round benchmark runs scheduled for the week August 18-22 2025. |
| 4 | + |
| 5 | +## Schedule |
| 6 | + |
| 7 | +| group | date | time | activity | |
| 8 | +| --- | ---- | ----- | --------------------- | |
| 9 | +| - | 08-15| 08:00 | Daint is reconfigured and resized for GB runs | |
| 10 | +| all | 08-15| ASAP | Daint is available to all teams for final testing at scale | |
| 11 | +| `g???`| 08-15| 18:00 | Daint is available to all teams for final testing at scale | |
| 12 | + |
| 13 | +## System |
| 14 | + |
| 15 | +The system [Daint][ref-cluster-daint] will be expanded to approximately 2350 Grace-Hopper nodes. |
| 16 | + |
| 17 | +* [Grace-Hopper nodes][ref-alps-gh200-node]. |
| 18 | +* [using Slurm with Grace-Hopper][ref-slurm-gh200]. |
| 19 | + |
| 20 | +!!! todo "information about partition, account, time limits" |
| 21 | + |
| 22 | +```bash |
| 23 | +#!/bin/bash |
| 24 | + |
| 25 | +#SBATCH --account=<account> |
| 26 | +#SBATCH --partition=<todo> |
| 27 | + |
| 28 | +srun --uenv=prgenv-gnu/24.11:v2 --view=default -n? -N? .... |
| 29 | +``` |
| 30 | + |
| 31 | +## Tips |
| 32 | + |
| 33 | +### Improving job startup times |
| 34 | + |
| 35 | +In the first round of GB runs we identified slow job startup times as a common cause of crashes during job startup. |
| 36 | + |
| 37 | +With HPE we have identified that the most likely cause is file system contention loading dynamic libraries before `main()` starts. |
| 38 | + |
| 39 | +The fix is to update how the squashfs file for the uenv or container used by your job is stored on the filesystem. |
| 40 | + |
| 41 | +```console title="set lustre striping on uenv squashfs file" |
| 42 | +$ uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}' |
| 43 | +/capstor/scratch/cscs/bcumming/.uenv-images/images/6068794b820fb4dd91019d020d6d98334a2f9fd23035a5e4a2f72f9dda5f1260/store.squashfs |
| 44 | +$ lfs setstripe --stripe-count -1 --stripe-size 4M $(uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}') |
| 45 | +``` |
| 46 | + |
| 47 | +!!! todo "update this with the final guidance" |
| 48 | + |
0 commit comments