-
Notifications
You must be signed in to change notification settings - Fork 934
Open
Description
What you would like to be added?
Problem: flux relies on hwloc to detect GPUs, and (depending on the type of GPU) behavior may be inconsistent. An easy fix is to generate the R file with gpus ourselves:
commands:
pre: |
echo "Regenerated resources"
flux R encode --hosts=${hosts} --cores=0-1 --gpu=0 > ${configroot}/etc/flux/system/R
cat ${configroot}/etc/flux/system/R
Another optimization is shared memory for MPI. By default, container runtimes only allocate 64M and ideally we get the entire node. The fix is an empty directory memory volume:
volumes:
# Ensure /dev/shm does not limit efa
shared-memory:
emptyDir: true
emptyDirMedium: "memory"Why is this needed?
We need to make sure the GPUs are reliably detected, and MPI has full access to shared memory on the host.
Love this feature?
Give it a π We prioritize the features with most π
Reactions are currently unavailable