Skip to content

Commit a069467

Browse files
committed
Fix displayed number of nodes allocated and used in job view
The number of allocated nodes was infered from metrics associated with the job. There are cases where a node's slurm-job-exporter will report no usage for a job, even if a job was allocated. This typically happens when no process for that job was launched for at least 60 seconds. This PR makes the allocated number reflects the number of nodes allocated by Slurm and the used number, the usage value infered from prometheus metrics.
1 parent 2522235 commit a069467

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

jobstats/templates/jobstats/job.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ <h2>{% translate "Resources" %}</h2>
322322
<tr>
323323
<td>{% translate "Nodes" %}</td>
324324
<td>{{nb_nodes}}</td>
325-
<td></td>
325+
<td>{{nb_nodes_used}}</td>
326326
</tr>
327327
<tr>
328328
<td>{% translate "CPU cores" %}</td>

jobstats/views.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,7 @@ def job(request, username, job_id):
300300

301301
context['tres_req'] = job.parse_tres_req()
302302
context['total_mem'] = context['tres_req']['total_mem'] * 1024 * 1024
303+
context['nb_nodes'] = job.nodes_alloc
303304

304305
comments = []
305306
if '--dependency=singleton' in job.submit_line \
@@ -387,10 +388,10 @@ def job(request, username, job_id):
387388
node_name = node['metric'][settings.PROM_NODE_HOSTNAME_LABEL].split(':')[0]
388389
cpu_bynode.append({'name': node_name, 'count': int(node['y'][0])})
389390
context['cpu_bynode'] = cpu_bynode
390-
context['nb_nodes'] = len(cpu_bynode)
391+
context['nb_nodes_used'] = len(cpu_bynode)
391392
except ValueError:
392393
context['cpu_bynode'] = None
393-
context['nb_nodes'] = None
394+
context['nb_nodes_used'] = 0
394395

395396
try:
396397
query_mem = 'sum(slurm_job_memory_max{{slurmjobid="{}", {}}})'.format(job_id, prom.get_filter())

0 commit comments

Comments
 (0)