File tree Expand file tree Collapse file tree 1 file changed +4
-1
lines changed Expand file tree Collapse file tree 1 file changed +4
-1
lines changed Original file line number Diff line number Diff line change @@ -47,14 +47,17 @@ In the first round of GB runs we identified slow job startup times as a common c
4747
4848With HPE we have identified that the most likely cause is file system contention loading dynamic libraries before ` main() ` starts.
4949
50- The fix is to update how the squashfs file for the uenv or container used by your job is stored on the filesystem.
50+ The fix is to update how the SquashFS file for the uenv or container used by your job is stored on the filesystem.
5151
5252``` console title="set lustre striping on uenv squashfs file"
5353$ uenv image inspect prgenv-gnu/24.11:v2 --format=' {sqfs}'
5454/capstor/scratch/cscs/bcumming/.uenv-images/images/6068794b820fb4dd91019d020d6d98334a2f9fd23035a5e4a2f72f9dda5f1260/store.squashfs
5555$ lfs migrate --stripe-count 20 --stripe-size 1M $( uenv image inspect prgenv-gnu/24.11:v2 --format=' {sqfs}' )
5656```
5757
58+ If you are using a [ SquashFS image for your Python environment] [ ref-guides-storage-venv ] ,
59+ you should also set the striping for that file.
60+
5861As an additional precaution, we recommend to increase the default wait threshold for ` MPI_Init ` from 180 seconds to 300.
5962``` console title="increase MPI initialization time-out"
6063$ export PMI_MMAP_SYNC_WAIT_TIME=300
You can’t perform that action at this time.
0 commit comments