From 6fed2943fbd63d426e1a1c37cac9604419f57f51 Mon Sep 17 00:00:00 2001 From: Rocco Meli Date: Mon, 18 Aug 2025 14:17:55 +0200 Subject: [PATCH] py stripe --- docs/guides/gb25.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/guides/gb25.md b/docs/guides/gb25.md index 1c67f036..93ed64da 100644 --- a/docs/guides/gb25.md +++ b/docs/guides/gb25.md @@ -47,7 +47,7 @@ In the first round of GB runs we identified slow job startup times as a common c With HPE we have identified that the most likely cause is file system contention loading dynamic libraries before `main()` starts. -The fix is to update how the squashfs file for the uenv or container used by your job is stored on the filesystem. +The fix is to update how the SquashFS file for the uenv or container used by your job is stored on the filesystem. ```console title="set lustre striping on uenv squashfs file" $ uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}' @@ -55,6 +55,9 @@ $ uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}' $ lfs migrate --stripe-count 20 --stripe-size 1M $(uenv image inspect prgenv-gnu/24.11:v2 --format='{sqfs}') ``` +If you are using a [SquashFS image for your Python environment][ref-guides-storage-venv], +you should also set the striping for that file. + As an additional precaution, we recommend to increase the default wait threshold for `MPI_Init` from 180 seconds to 300. ```console title="increase MPI initialization time-out" $ export PMI_MMAP_SYNC_WAIT_TIME=300