Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/guides/gb25.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,17 @@ In the first round of GB runs we identified slow job startup times as a common c

With HPE we have identified that the most likely cause is file system contention loading dynamic libraries before `main()` starts.

The fix is to update how the squashfs file for the uenv or container used by your job is stored on the filesystem.
The fix is to update how the SquashFS file for the uenv or container used by your job is stored on the filesystem.

```console title="set lustre striping on uenv squashfs file"
$ uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}'
/capstor/scratch/cscs/bcumming/.uenv-images/images/6068794b820fb4dd91019d020d6d98334a2f9fd23035a5e4a2f72f9dda5f1260/store.squashfs
$ lfs migrate --stripe-count 20 --stripe-size 1M $(uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}')
```

If you are using a [SquashFS image for your Python environment][ref-guides-storage-venv],
you should also set the striping for that file.

As an additional precaution, we recommend to increase the default wait threshold for `MPI_Init` from 180 seconds to 300.
```console title="increase MPI initialization time-out"
$ export PMI_MMAP_SYNC_WAIT_TIME=300
Expand Down
Loading