Skip to content

[feat] optimizations for fluxΒ #3321

@vsoch

Description

@vsoch

What you would like to be added?

Problem: flux relies on hwloc to detect GPUs, and (depending on the type of GPU) behavior may be inconsistent. An easy fix is to generate the R file with gpus ourselves:

    commands:
      pre: |
        echo "Regenerated resources"
        flux R encode --hosts=${hosts} --cores=0-1 --gpu=0 > ${configroot}/etc/flux/system/R
        cat ${configroot}/etc/flux/system/R

Another optimization is shared memory for MPI. By default, container runtimes only allocate 64M and ideally we get the entire node. The fix is an empty directory memory volume:

    volumes:
      # Ensure /dev/shm does not limit efa
      shared-memory:
        emptyDir: true
        emptyDirMedium: "memory"

Why is this needed?

We need to make sure the GPUs are reliably detected, and MPI has full access to shared memory on the host.

Love this feature?

Give it a πŸ‘ We prioritize the features with most πŸ‘

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions