-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Milestone
Description
Slurm Scheduler doesn't support multi-cluster
I am trying to run a series of tests across multiple clusters in Slurm.
- I have accounting configured and can interact with the clusters using sbatch, sinfo, sacct .... using -M flag to specify the cluster
- I use the
accessproperty to specify my cluster in my partitions in my reframe cluster config and can see the jobs are submitted to the correct cluster - when running with verbose output I can see that sacct arguments are pre-defined and there is no way to pass additional values to sacct as seen
here
Snippet of core/schedulers/slurm.py
completed = _run_strict(
f'sacct -S {t_start} -P '
f'-j {",".join(job.jobid for job in jobs)} '
f'-o jobid,state,exitcode,end,nodelist'
)Am I missing something obvious? If not could we do something that lets us set additional args in scheduler options per reframe system partition something similar to sched_access_in_submit or addidional_args?
Section of cluster.py
site_configuration = {
'systems': [
{
...
'partitions': [
{
'name': 'clusterA',
'descr': 'clusterA',
'scheduler': 'slurm',
'launcher': 'srun',
'environs': ['clusterA'],
'access': ['-M clusterA']
},
{
'name': 'clusterB',
'descr': 'clusterB',
'scheduler': 'slurm',
'launcher': 'srun',
'environs': ['clusterB'],
'access': ['-M clusterB']
}
]
}
],
'environments': [
{
'name': 'clusterA',
...
},
{
'name': 'clusterB',
...
}
]
}Cluster A is my default cluster so tests run fine here.
Output of sacct
No -M means it is running on default cluster.
Entering stage: run
Entering stage: run
Entering stage: run_wait
[CMD] 'sacct -S 2025-09-30 -P -j 54 -o jobid,state,exitcode,end,nodelist'Metadata
Metadata
Assignees
Type
Projects
Status
Todo