-
Notifications
You must be signed in to change notification settings - Fork 932
Description
When using the latest OpenMPI commit (d782542) and trying to attach to an MPI job via LaunchMON (https://github.com/LLNL/LaunchMON) I get the following error:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 60 slots
that were requested by the application:
/nfs/tmp2/lee218/prefix/stat-travis/bin/STATD
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
I am on a system that allocated nodes via SLURM and this is on 2 nodes with 64 tasks. The application was launched with mpirun/orterun with 4 MPI processes, which might explain why it's trying to fill the node with 64-4=60 slots. However, it should only try launching 1 STATD daemon process per node. Let me know if there are more diagnostics that I can gather to help diagnose this. Note I am able to attach TotalView to the a similarly launched MPI job, so LaunchMON is doing something different than TotalView for process acquisition/daemon launch.
Perhaps similarly related, I notice that if I instead launch the MPI application using all 64 tasks that were allocated, my attempt to attach a tool results in:
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
Let me know if this should be submitted as a separate issue. I can provide additional diagnostics for this too if need be.