Unexpected slurm behavior related to `--ntasks` and `--nodes`

### What would you like to see added?

Slurm has unexpected (to us) behavior when using resource request flags in varying combinations.

While working on Parabricks, Prema discovered that the default behavior of `--ntasks` without `--nodes` is to allocate one node per task, which can be quite expensive.

A bit of diving through the `sbatch` docs unveiled some semi-implied, intended pathways for allocation strategies, and there are a lot of very different behaviors for controlling how allocations are made.

I can't go into the details just yet because we've only scratched the surface, but here are some examples...

`--nodes` without `--ntasks`: Allocates the specified number of nodes, defaults to `--ntasks=1`, treat using the next case.
`--nodes` with `--ntasks`: Allocated the specified number of nodes, and assumes the specified number of tasks per node.
`--ntasks` without `--nodes`: The behavior is as though there were a flag `--nodes-per-task=1` specified. There is no such flag. The allocation is made with a number of nodes equal to `--ntasks`.

There is also some inconsistency in either naming resources, or inconsistency in reporting requested vs. allocated resources.

The following job was made with `--ntasks=2` with no specification of `--nodes`.

```text
                                       AllocTRES                                          ReqTRES 
------------------------------------------------ ------------------------------------------------ 
billing=24,cpu=24,gres/gpu=2,mem=200G,node=2     billing=24,cpu=24,gres/gpu=1,mem=100G,node=1
```

Note that nodes is doubled from the req. The req records the exact input to `--nodes` (which was defaulted to 1 by specifying only `--ntasks`), and to the other resources. It does NOT record ntasks!

However, the alloc records how much total was allocated. But only for `nodes` and `mem` and `gres/gpu`. `cpu` is not doubled! It isn't fully clear whether cpu allocation is recorded per-node, or whether the number of cpus allocated was divided among the nodes. From past experience, I believe it is per-node. But then why is `mem` not per node?

It may be that using `AllocTRES` is not useful and to use `-o ReqMem` instead to get the `100Gn` or `10Gc` or whatever to distinguish between per-core and per-node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected slurm behavior related to `--ntasks` and `--nodes` #505

What would you like to see added?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected slurm behavior related to --ntasks and --nodes #505

Description

What would you like to see added?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unexpected slurm behavior related to `--ntasks` and `--nodes` #505