Skip to content

Commit 776a225

Browse files
authored
Add support for GRES to ondemand apps (#837)
* wip - bump openhpc role for testing * remove GresTypes from MIG docs * enable nvml autoconfiguration for CaaS * fix linter problems * add support for GRES to ondemand desktop,matlab,rstudio apps * support multiple gres per node * fix linter errors * fix pyink lint errors * fix python linting errors * fix ansible-lint errors * add partition info to GRES selection * tidy GRES labels * fix lint errors
1 parent de31699 commit 776a225

File tree

5 files changed

+180
-3
lines changed

5 files changed

+180
-3
lines changed

ansible/roles/openondemand/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@ This role enables SSL on the Open Ondemand server, using the following self-sign
7373
- `openondemand_desktop_screensaver`: Optional. Whether to enable screen locking/screensaver. **NB:** Users must have passwords if this is enabled. Bool, default `false`.
7474
- `openondemand_filesapp_paths`: List of paths (in addition to $HOME, which is always added) to include shortcuts to within the Files dashboard app.
7575
- `openondemand_jupyter_partition`: Required. Name of Slurm partition to use for Jupyter Notebook servers. Requires a corresponding group named "openondemand_jupyter" and entry in openhpc_partitions.
76+
- `openondemand_gres_options`: Optional. A list of `[label, value]` items used
77+
to provide a drop-down for resource/GRES selection in application forms. The
78+
default constructs a list from all GRES definitions in the cluster. See the
79+
`option` attribute of the Select Field [form widget](https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/form-widgets.html#form-widgets).
7680

7781
### Monitoring
7882

ansible/roles/openondemand/defaults/main.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ openondemand_osc_ood_defaults:
105105
# Use repo file provided by dnf_repos by default
106106
ood_use_existing_repo_file: true
107107

108+
# Apps:
108109
openondemand_code_server_version: 4.102.2
109110
openondemand_rstudio_version: 2025.05.1-513
110111
openondemand_matlab_version: ''
112+
# Below is automatically calculated during role run:
113+
openondemand_gres_options: "{{ _openondemand_sinfo_gres.stdout | to_gres_options }}"
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#!/usr/bin/python
2+
# pylint: disable=missing-module-docstring
3+
4+
# Copyright: (c) 2025, StackHPC
5+
# Apache 2 License
6+
7+
8+
def to_gres_options(stdout):
9+
"""Convert sinfo output into a list of GRES options for an Ondemand `select`
10+
widget.
11+
12+
Parameters:
13+
stdout: Text from `sinfo --noheader --format "%R %G"`
14+
15+
Returns a list of [label, value] items. This is the format required for
16+
the `options` attribute of a `select` widget [1] where:
17+
- value (str) is a valid entry for the srun/sbatch --gres option [2].
18+
- label (str) is a user-friendly label with gres name, gres type and
19+
maximum gres count where relevant.
20+
The returned list will always include an entry for no GRES request.
21+
22+
For example with a single GRES defined of `gpu:H200:8' the following
23+
entries are returned:
24+
- ['None', 'none']
25+
- ['Any gpu (max count=8, partitions=standard,long)', 'gpu']
26+
- ['H200 gpu (max count=8, partitions=standard,long)', 'gpu:H200']
27+
28+
[1] https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/form-widgets.html#form-widgets
29+
[2] https://slurm.schedmd.com/srun.html#OPT_gres
30+
""" # noqa: E501 pylint: disable=line-too-long
31+
32+
gres_data = {}
33+
# key=gres_opt - 'name' or 'name:type', i.e. what would be passed to --gres
34+
# value={label:str, max_count: int, partitions=[]}
35+
gres_data["none"] = {"label": "None", "max_count": 0, "partitions": ["all"]}
36+
37+
for line in stdout.splitlines():
38+
# line examples:
39+
# 'part1 gpu:H200:8(S:0-1),test:foo:1'
40+
# 'part2 (null)'
41+
# - First example shows multiple GRES per partition
42+
# - Core suffix e.g. '(S:0-1)' only exists for auto-detected gres
43+
# - stackhpc.openhpc role guarantees that name:type:count all exist
44+
partition, gres_definitions = line.split()
45+
for gres in gres_definitions.split(","):
46+
if "(null)" in gres:
47+
continue
48+
gres_name, gres_type, gres_count_cores = gres.split(":", maxsplit=2)
49+
gres_count = gres_count_cores.split("(")[0]
50+
for gres_opt in [gres_name, f"{gres_name}:{gres_type}"]:
51+
if gres_opt not in gres_data:
52+
label = (
53+
f"{gres_type} {gres_name}"
54+
if ":" in gres_opt
55+
else f"Any {gres_opt}"
56+
)
57+
gres_data[gres_opt] = {
58+
"label": label,
59+
"max_count": gres_count,
60+
"partitions": [partition],
61+
}
62+
else:
63+
gres_data[gres_opt]["partitions"].append(partition)
64+
if gres_count > gres_data[gres_name]["max_count"]:
65+
gres_data[gres_opt]["max_count"] = gres_count
66+
67+
gres_options = []
68+
for gres_opt in gres_data: # pylint: disable=consider-using-dict-items
69+
max_count = gres_data[gres_opt]["max_count"]
70+
partitions = gres_data[gres_opt]["partitions"]
71+
label = gres_data[gres_opt]["label"]
72+
if gres_opt != "none":
73+
label += f" (max count={max_count}, partitions={','.join(partitions)})"
74+
gres_options.append((label, gres_opt))
75+
return gres_options
76+
77+
78+
# pylint: disable=useless-object-inheritance
79+
# pylint: disable=too-few-public-methods
80+
class FilterModule(object):
81+
"""Ansible core jinja2 filters"""
82+
83+
# pylint: disable=missing-function-docstring
84+
def filters(self):
85+
return {
86+
"to_gres_options": to_gres_options,
87+
}

ansible/roles/openondemand/tasks/main.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,12 @@
3939
vars_from: main.yml
4040
public: true
4141

42+
- name: Get GRES information
43+
ansible.builtin.command:
44+
cmd: sinfo --noheader --format "%R %G" # can't use , or : as separator
45+
changed_when: true
46+
register: _openondemand_sinfo_gres
47+
4248
- ansible.builtin.include_role:
4349
name: osc.ood
4450
tasks_from: install-apps.yml

environments/common/inventory/group_vars/all/openondemand.yml

Lines changed: 80 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,16 +122,30 @@ openondemand_apps_desktop_default:
122122
- bc_queue
123123
- bc_num_hours
124124
- num_cores
125+
- gres
126+
- gres_count
125127
- node
126128
attributes:
127129
desktop: xfce
128-
# bc_account: # i.e. slurm account
129-
# value: root
130130
bc_queue:
131131
value: "{{ openondemand_desktop_partition | default(none) }}"
132132
num_cores:
133133
label: Number of cores
134134
value: 1
135+
gres:
136+
label: Resources
137+
help: Select GPU or other Slurm GRES resources
138+
required: true
139+
widget: select
140+
options: "{{ openondemand_gres_options }}"
141+
gres_count:
142+
label: Resource count
143+
help: Count of GPU or other Slurm GRES resources
144+
required: false
145+
widget: number_field
146+
value: 1
147+
min: 1
148+
step: 1
135149
node:
136150
label: Node name
137151
help: Select a particular node or leave empty to let Slurm pick the next available
@@ -144,6 +158,9 @@ openondemand_apps_desktop_default:
144158
- <%= "--nodes=1" %>
145159
- <%= "--ntasks=#{num_cores}" %>
146160
- <%= "--nodelist=#{node}" %>
161+
<% if gres != 'none' %>
162+
- <%= "--gres=#{gres}:#{gres_count}" %>
163+
<% end %>
147164
openondemand_apps_desktop: "{{ {'bc_desktop':openondemand_apps_desktop_default} if openondemand_desktop_partition | default(none) else {} }}"
148165

149166
# yamllint disable-line rule:line-length
@@ -158,16 +175,35 @@ openondemand_apps_jupyter_default:
158175
- bc_queue
159176
- bc_num_hours
160177
- num_cores
178+
- gres
179+
- gres_count
161180
- node
162181
attributes: # TODO
163182
num_cores:
164183
label: Number of cores
165184
value: 1
166185
modules: ""
186+
gres:
187+
label: Resources
188+
help: Select GPU or other Slurm GRES resources
189+
required: true
190+
widget: select
191+
options: "{{ openondemand_gres_options }}"
192+
gres_count:
193+
label: Resource count
194+
help: Count of GPU or other Slurm GRES resources
195+
required: false
196+
widget: number_field
197+
value: 1
198+
min: 1
199+
step: 1
200+
node:
201+
label: Node name
202+
help: Select a particular node or leave empty to let Slurm pick the next available
203+
value: ""
167204
extra_jupyter_args: ""
168205
bc_queue:
169206
value: "{{ openondemand_jupyter_partition | default(none) }}"
170-
node: ""
171207
submit: |
172208
---
173209
batch_connect:
@@ -182,6 +218,9 @@ openondemand_apps_jupyter_default:
182218
- <%= "--nodes=1" %>
183219
- <%= "--ntasks=#{num_cores}" %>
184220
- <%= "--nodelist=#{node}" %>
221+
<% if gres != 'none' %>
222+
- <%= "--gres=#{gres}:#{gres_count}" %>
223+
<% end %>
185224
openondemand_apps_jupyter: "{{ {'jupyter':openondemand_apps_jupyter_default} if openondemand_jupyter_partition | default(none) else {} }}"
186225

187226
openondemand_apps_rstudio_default:
@@ -233,6 +272,20 @@ openondemand_apps_rstudio_default:
233272
bc_email_on_started: false
234273
auto_modules_RStudio-Server:
235274
default: false
275+
gres:
276+
label: Resources
277+
help: Select GPU or other Slurm GRES resources
278+
required: true
279+
widget: select
280+
options: "{{ openondemand_gres_options }}"
281+
gres_count:
282+
label: Resource count
283+
help: Count of GPU or other Slurm GRES resources
284+
required: false
285+
widget: number_field
286+
value: 1
287+
min: 1
288+
step: 1
236289
form:
237290
- bc_queue
238291
- rstudio_module
@@ -242,6 +295,8 @@ openondemand_apps_rstudio_default:
242295
- ram
243296
- bc_num_hours
244297
- bc_email_on_started
298+
- gres
299+
- gres_count
245300
submit: |
246301
---
247302
batch_connect:
@@ -261,6 +316,9 @@ openondemand_apps_rstudio_default:
261316
- "<%= cores.blank? ? 1 : cores.to_i %>"<% if auto_queues.start_with?("gpu") %>
262317
- "--gpus-per-task"
263318
- "1"<% end %>
319+
<% if gres != 'none' %>
320+
- <%= "--gres=#{gres}:#{gres_count}" %>
321+
<% end %>
264322
openondemand_apps_rstudio: "{{ {'rstudio':openondemand_apps_rstudio_default} if openondemand_rstudio_partition | default(none) else {} }}"
265323

266324
openondemand_apps_matlab_default:
@@ -274,6 +332,8 @@ openondemand_apps_matlab_default:
274332
- matlab_module
275333
- cores
276334
- ram
335+
- gres
336+
- gres_count
277337
attributes:
278338
desktop: xfce
279339
# bc_account: # i.e. slurm account
@@ -314,6 +374,20 @@ openondemand_apps_matlab_default:
314374
step: 1
315375
value: 30
316376
cachable: true
377+
gres:
378+
label: Resources
379+
help: Select GPU or other Slurm GRES resources
380+
required: true
381+
widget: select
382+
options: "{{ openondemand_gres_options }}"
383+
gres_count:
384+
label: Resource count
385+
help: Count of GPU or other Slurm GRES resources
386+
required: false
387+
widget: number_field
388+
value: 1
389+
min: 1
390+
step: 1
317391
submit: |
318392
---
319393
script:
@@ -327,6 +401,9 @@ openondemand_apps_matlab_default:
327401
- "<%= ram.blank? ? 4 : ram.to_i %>G"
328402
- "--cpus-per-task"
329403
- "<%= cores.blank? ? 1 : cores.to_i %>"
404+
<% if gres != 'none' %>
405+
- <%= "--gres=#{gres}:#{gres_count}" %>
406+
<% end %>
330407
openondemand_apps_matlab: "{{ {'matlab':openondemand_apps_matlab_default} if openondemand_matlab_partition | default(none) else {} }}"
331408

332409
openondemand_apps_codeserver_default:

0 commit comments

Comments
 (0)