Futures on HPC SLURM cluster #468
-
|
I'm searching for advice on how to efficiently parallelize a code based on the Seems the obvious suggestion is to use I submitted the job using this sbatch script: The call to For the The elapsed times are reasonable: a lot of overhead for In all this picture is missing Any idea, experience or suggestion? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 13 replies
-
So, this job will get assigned (up to) four compute hosts. In order for R to parallelize on those, you'll have to use
The sequential/cluster ratio at 10.6/2.8 = 3.8 suggests that the code parallelizes nicely out to 4 workers running on 4 different machines. The other ratios - sequential/multisession = 0.05 and sequential/multicore = 0.3 clearly indicate that something is not working as you expected. I'm still fairly new to Slurm (I've now got access to such a cluster so I'm planning to catch soon), but I suspect that you're ending up over parallelizing here. It might be due to a bug in Can you add the following to your job script: env | grep SLURM
Rscript -e "parallelly::availableCores(which = 'all')"and let me know what it gives? It should help reveal what's going on.
Nice, I didn't know about plan(cluster, workers = availableWorkers())without having to pass the expanded nodelist as an argument. Even better, the default for plan(cluster)and it'll automatically detect that you're running on Slurm and what hostnames you've got allotted to work with.
So, first of all, that Mandelbrot demo is not the best example for benchmarking this, especially since it does not do any map-reduce internally. Here tools such as Instead, think about your HPC scheduler as a batch system with high throughput at the cost of high latency. That is, you can process lots of tasks over time, but the turnaround time per task is much higher than you have on a single machine. You wrote:
Yes, this is a known limitation of future.batchtools, or rather a lack of skills in the core future framework that can automagically predict what you want to do and merge chunks of futures in a single ones. That's actually on the roadmap but quite far ahead. Until the, you want to rely on map-reduce API such as future.apply and furrr to chunk up your data and distribute them out in larger futures (=more elements per job). There's also a concept of nested parallelism that you can make use of. I use that myself when processing human sequencing data. I use future.batchtools to submitting one job per person, and then a second layer where I parallel process the 25 chromosomes on 25 cores using |
Beta Was this translation helpful? Give feedback.
So, this job will get assigned (up to) four compute hosts. In order for R to parallelize on those, you'll have to use
plan(cluster, workers = <hostnames>). None of the other future backends mentioned can scale out to multiple machines; they'll only run on the current machine.The sequential/cluster ratio at 10.6/2.8 = 3.8 suggests that the code parallelizes nicely out to 4 workers running on 4 different machines.
The other ratios - sequential/multisession = 0.05 and sequential/multicore = 0.3 clearly indicate that something is not working as you expected. I'm still fairly new to Slurm (…