-
Notifications
You must be signed in to change notification settings - Fork 1
NCSA Delta: How to run reconverse and Charm with reconverse
Incorporates the reconverse communication layer in Charm++, a future replacement for Converse that is more sustainable, more lightweight (fewer lines of code), and incorporates LCI (github.com/uiuc-hpc/lci).
Installing and running on Delta
$ git clone charm && git checkout reconverse-support
$ module load libfabric
In the charm top-level directory,
./build charm++ multicore-linux-x86_64 --with-production -j8
In user program, change Makefile to point to charm, for example
CHARMC=/path/to/charm/bin/charmc
When submitting jobs (sbatch or salloc), export the following
export LD_LIBRARY_PATH=/path/to/charm/lib:$LD_LIBRARY_PATH
export LCI_ATTR_BACKEND=ofi
export FI_CXI_RX_MATH_MODE=either “hybrid” or “software”
Use a larger process width than you would with old converse (i find that 2 or 4 procs per socket gives me the best times on reconverse, vs. 8 in old converse)
Make sure to +pemap if you are using all/almost all cores. If you do not do this then if 2 PEs are mapped to the same PU, your job will abort without explanation (except if you do bullet point c below)
Setcpuaffinity isn’t implemented yet in reconverse so you need to provide a manual pemap
Run srun jobs with --unbuffered to force prints before aborting (to help with debugging)
Delta documentation shows layout of cores wrt numa domains https://docs.ncsa.illinois.edu/systems/delta/en/latest/user_guide/architecture.html
Note 0: You can let charm use your local copy of reconverse by
./build --with-fetch-reconverse-dir=/path/to/reconverse <other args>
Note 1:
Instead of specifying export LCI_ATTR_BACKEND=ofi every time you run the program, you could also do
./build --with-cmake-args="-DLCI_NETWORK_BACKENDS=ofi" <other args>