Skip to content

--map-by ppr:2:socket fails #12852

@zerothi

Description

@zerothi

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Manual installation

$> ompi_info -c
  Configure command line: 'CC=gcc' 'CXX=g++' 'FC=gfortran'
                          '--prefix=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          '--with-lsf=/lsf/10.1'
                          '--with-lsf-libdir=/lsf/10.1/linux3.10-glibc2.17-x86_64/lib'
                          '--without-tm' '--enable-mpi-fortran=all'
                          '--with-hwloc=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          '--enable-orterun-prefix-by-default'
                          '--with-ucx=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          '--with-ucc=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          '--with-knem=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          '--without-verbs' 'FCFLAGS=-O3 -march=haswell
                          -mtune=haswell -mavx2 -m64 
                          -Wl,-z,max-page-size=0x1000 -O3
                          -Wa,-mbranches-within-32B-boundaries
                          -falign-functions=32 -falign-loops=32' 'CFLAGS=-O3
                          -march=haswell -mtune=haswell -mavx2 -m64 
                          -Wl,-z,max-page-size=0x1000 -O3
                          -Wa,-mbranches-within-32B-boundaries
                          -falign-functions=32 -falign-loops=32'
                          'CXXFLAGS=-O3 -march=haswell -mtune=haswell -mavx2
                          -m64  -Wl,-z,max-page-size=0x1000 -O3
                          -Wa,-mbranches-within-32B-boundaries
                          -falign-functions=32 -falign-loops=32'
                          '--with-ofi=no'
                          '--with-libevent=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
                          'LDFLAGS=-L/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92/lib
                          -L/lsf/10.1/linux3.10-glibc2.17-x86_64/lib
                          -Wl,-rpath,/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92/lib
                          -Wl,-rpath,/lsf/10.1/linux3.10-glibc2.17-x86_64/lib
                          -lucp  -levent -lhwloc -latomic -llsf -lm -lpthread
                          -lnsl -lrt'
                          '--with-xpmem=/appl9/gcc/13.3.0-binutils-2.42/openmpi/5.0.3-lsf10-alma92'
...

Please describe the system on which you are running

  • Operating system/version: AlmaLinux 9.4

Details of the problem

There are some different levels to the problem

  1. I never did realize that socket was a discontiuned qualifier in --map-by, this probably needs more emphasis in the releases, but I guess we can live with it. This choice is a bit unfortunate (IMHO ;)). It is not clear to me whether the package entails the same qualifier as a socket. Now, I can understand that we want to be more generic, but if the rest of the terms are not generic, then why removing the socket qualifier?

  2. The man mpirun pages says:

      To map processes:

       • --map-by  <object>: Map to the specified object, defaults to package. Supported options include slot, hwthread, core, L1cache, L2cache, L3cache, package, numa, node, seq, rank‐
         file, pe-list=#, and ppr.  Any object can include modifiers by adding a : and any combination of the following:

            • pe=n: bind n processing elements to each proc

            • span: load balance the processes across the allocation

            • oversubscribe: allow more processes on a node than processing elements

            • nooversubscribe: do not allow more processes on a node than processing elements (default)

            • nolocal: do not place processes on the same host as the mpirun process

            • hwtcpus: use hardware threads as CPU slots for mapping

            • corecpus: use processor cores as CPU slots for mapping (default)

            • file=filename: used with rankfile; use filename to specify the file to use

            • ordered: used with pe-list to bind each process to one of the specified processing elements

         NOTE:
            socket is also accepted as an alias for package.

So, I would have assumed that socket works out-of-the box in all qualifier specifications, at least for now.

So I did:

$> mpirun -np 2 --map-by socket --report-bindings
[hpclogin1:4121948] Rank 0 bound to package[0][core:0-15]
[hpclogin1:4121948] Rank 1 bound to package[0][core:0-15]

$> mpirun -np 2 --map-by ppr:2:socket --report-bindings
--------------------------------------------------------------------------
The map-by directive contains an unrecognized qualifier:

  Qualifier: ppr:2:socket
  Valid qualifiers: ppr:2:[slot:hwthread:core:l1cache:l2cache:l3cache:numa:package:node]

Please check for a typo or ensure that the qualifier is a supported one.
--------------------------------------------------------------------------

It should have worked above, no?

  1. For clarity in the warnings, errors and manuals, it would be best to streamline the argument order:

In the man pages, we have:

slot, hwthread, core, L1cache, L2cache, L3cache, package, numa, node, seq, rank‐...

while in the error above we have:

  Valid qualifiers: ppr:2:[slot:hwthread:core:l1cache:l2cache:l3cache:numa:package:node]

(notice how numa and package has swapped places)
Can we assume that numa < package always?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions