Replies: 1 comment 1 reply
-
That's a good question that we would like to know the answer to :) Sadly, we don't have a lot of experience with SLURM and we're not sure how best to configure this. Regarding the multinode situation, my understanding is that |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all: I'm loving HyperQueue and its features. 🙏 Since integrating it with our workflow manager AiiDA it's been an invaluable tool to partially use nodes on clusters that have an exclusive node-job policy, and avoiding queueing for very small jobs in my workflows that need to run on the compute nodes.
I've been having a bit of trouble combining HyperQueue with MPI, however. Below I outline my current approach, would be great to get some feedback and suggestions!
Running on a single node
For the use cases described above, I've so far been using HQ with an allocation that only uses a single node. Since the calculations I'm running are vastly more efficient with MPI, I typically run a HQ auto-allocation such as:
And then submit HQ jobs similar to:
This can also be used for submitting multiple jobs in parallel on a single node, but then I typically have to tweak the
--oversubscribe,--overlapand--cpu-bind(in combination with$HQ_CPUS) to make it work. This seems to be cluster-dependent and I can't always get it to work. Is there a better approach I'm missing?Running on multiple nodes
Another use case is when I want to run a multi-node Slurm job, and run multiple single-node HQ jobs in this Slurm job. This can be useful when I have a lot of small jobs to run but Slurm is configured to only allow a certain number of jobs in the queue for a partition.
To test, this I was looking at the documentation for manual submission:
https://it4innovations.github.io/hyperqueue/stable/deployment/worker/#deploying-a-worker-using-pbsslurm
And also trying an auto-allocation with multiple nodes:
Both suggest using
mpirun/srunto run theworker startcommand. Below is the submission script generated by HQ for the auto-allocation:When trying this with the HQ job script above, I obviously get into trouble. Calling
srunwithin ansrunstep seems dubious, and I'm asking for 128 tasks within a job step which only has 4, so I get an errorRemoving
srundoesn't have the desired effect either though. HQ is running 4 workers once the allocation starts, but the calculations don't run in parallel, and the one running is only using 4 mpi tasks.My current "solution"
After quite a bit of trial and error (in lieu of understanding and experience), I've come up with a solution that almost works. Basically, I run the same HQ job script as above for the calculations, but do a manual Slurm submission starting four HQ workers in the background:
Full Slurm submission script
When running 4 jobs with 128 tasks, the first three run just fine in parallel, with a similar performance as a single run directly submitted to Slurm. However the fourth fails with the following in the
stderr:Interestingly, if I run on 3 nodes with 3 HQ workers, I don't get this issue at all. I'm still looking into running with more nodes, but am queueing quite a bit atm.
Again, any suggestions or tips would be most appreciated!
Beta Was this translation helpful? Give feedback.
All reactions