Using sbatch does not work well with Julia JIT

I recently came across this package now that ClusterManagers is throwing a dep warning for Slurm. The new package uses `sbatch` functionality which lets SLURM handle the resource allocation on top of which Julia then spawns worker processes (i.e., `sbatch -> srun`). While this method works well,  the workflow is different than `ClusterManagers` and for myself, I don't think this method is well suited for Julia's workflow, especially with prototyping and interactivity. Let me illustrate. 

In the old method, my workflow was like this: 
```
using ClusterManagers
using Distributed 
using Revise
addprocs(SlurmManager(10), kwargs) # ask for 10 tasks

@everywhere includet("model.jl")  # includet for Revise, contains functions for long-running scripts 

function run_simulations(params)  # defined on the head node
   results = pmap(1:100) do x 
      run_long_simulation(params) # runs on the worker processes
   end 
   return results  # an array of outputs from run_long_simulation
end

function process_simluations()   # defined/runs on the headnode
# process/plot simulation results
end
```
This workflow was great. After the initial allocation and loading of script using Revise, I could `run_simulations(); process_simulations` (which are now on the headnode) and generate summary statistics, plots, and so on. If I needed to change parameters, I could simply run `run_simulations(params)` with a different set. **This means that I take advantage of all the compiled code on the worker processes**. Using Revise also means I could go into `run_long_simulation()`, make my changes, and `run_simulations()` will pick those changes up (**across all workers**). 

The `sbatch` method does not give you this flexibility. The main issues are
- It has to compile the code every time you run `sbatch script.jl`. 
- Lose interactive flexibility - have to save data to files, have another instance of Julia open for analysis, etc. 
- Issues with project directory as `sbatch` runs from a different working directory (yes, there is a env variable set with the working dir so it's managable)
- The initial execution of the script runs on the allocated resources instead of the head/login node. 

This really hurts productivity and workflow and feels not very "Julia"-like. 

*Alternative / Going back to the old method* 
I found a workaround to replicate the above behaviour without using `sbatch`. From the terminal, 
```
(base) odinuser02@podin:~$ salloc -N 2 --ntasks-per-node=10 bash
salloc: Granted job allocation 495
(base) odinuser02@podin:~$
```
This throws me in an interactive session (on the headnode). Now I launch julia 
```
julia> using Distributed, SlurmClusterManager

julia> addprocs(SlurmManager(),
                exeflags="--project=$(ENV["SLURM_SUBMIT_DIR"])")
20-element Vector{Int64}:
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
``` 
This lets met work interactively, working directory/projects are easy (i.e., `julia --project=.` sets the correct project), I can use Revise, and prototype my model. Once my `pmap` returns data, I can use plotting libraries (which on my cluster are only on the head node).
```
julia> @everywhere println("hello from $(myid()):$(gethostname())")
hello from 1:podin
hello from 4:ops03
hello from 9:ops03
hello from 14:opsc01
hello from 6:ops03
hello from 2:ops03
hello from 10:ops03
hello from 7:ops03
hello from 21:ops03
hello from 5:ops03
hello from 11:opsc01
hello from 8:ops03
```


I am mainly using this issue for awareness and providing a method that replicates the old workflow. *I think having an example of using `salloc` in the README might be useful for a lot of folks.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using sbatch does not work well with Julia JIT #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using sbatch does not work well with Julia JIT #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions