Skip to content

Commit bc7c39e

Browse files
Add SLURM_MAX_ARRAY_SIZE env var for to control maximum running slurm jobs for array jobs in cluster_tools (#788)
* add env for slurm max running array jobs * renamed max_running to max_running_size and SLURM_MAX_RUNNING to SLURM_MAX_RUNNING_SIZE * add SLURM_MAX_ARRAY_SIZE to documentation * ran ./format.sh and ./lint.sh * ran ./lint.sh again with black==21.12b0 Co-authored-by: Philipp Otto <[email protected]>
1 parent 098f367 commit bc7c39e

File tree

2 files changed

+20
-2
lines changed

2 files changed

+20
-2
lines changed

cluster_tools/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ if __name__ == '__main__':
2626
The `cluster_tools` automatically determine the slurm limit for maximum array job size and split up larger job batches into multiple smaller batches.
2727
Also, the slurm limit for the maximum number of jobs which are allowed to be submitted by a user at the same time is honored by looking up the number of currently submitted jobs and only submitting new batches if they fit within the limit.
2828

29-
If you would like to configure these limits independently, you can do so by setting the `SLURM_MAX_ARRAY_SIZE` and `SLURM_MAX_SUBMIT_JOBS` environment variables.
29+
If you would like to configure these limits independently, you can do so by setting the `SLURM_MAX_ARRAY_SIZE` and `SLURM_MAX_SUBMIT_JOBS` environment variables. You can also limit the maximum number of simultaneously running tasks within the slurm array job(s) by using the `SLURM_MAX_RUNNING_SIZE` environment variable.
3030

3131
### Kubernetes
3232

cluster_tools/cluster_tools/schedulers/slurm.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,18 @@ def get_max_array_size():
118118
)
119119
return max_array_size
120120

121+
@staticmethod
122+
@cache_in_production
123+
def get_max_running_size():
124+
max_running_size_env = os.environ.get("SLURM_MAX_RUNNING_SIZE", None)
125+
if max_running_size_env is not None:
126+
logging.debug(
127+
f"SLURM_MAX_RUNNING_SIZE env variable specified which is {max_running_size_env}."
128+
)
129+
return int(max_running_size_env)
130+
131+
return 0
132+
121133
@staticmethod
122134
@cache_in_production
123135
def get_max_submit_jobs():
@@ -219,6 +231,10 @@ def inner_submit(
219231

220232
max_array_size = self.get_max_array_size()
221233
max_submit_jobs = self.get_max_submit_jobs()
234+
max_running_size = self.get_max_running_size()
235+
slurm_max_running_size_str = (
236+
"%{}".format(max_running_size) if max_running_size > 0 else ""
237+
)
222238
# Only ever submit at most max_submit_jobs and max_array_size jobs at once (but at least one).
223239
batch_size = max(min(max_array_size, max_submit_jobs), 1)
224240

@@ -233,7 +249,9 @@ def inner_submit(
233249

234250
job_array_line = ""
235251
if job_count is not None:
236-
job_array_line = "#SBATCH --array=0-{}".format(array_index_end)
252+
job_array_line = "#SBATCH --array=0-{}{}".format(
253+
array_index_end, slurm_max_running_size_str
254+
)
237255
script_lines = (
238256
[
239257
"#!/bin/sh",

0 commit comments

Comments
 (0)