Skip to content

Commit 683f017

Browse files
author
Matt Pryor
authored
Merge pull request #20 from stackhpc/feat/volume-types
Add option to configure high-performance home dirs
2 parents 520b83f + 89186ad commit 683f017

File tree

3 files changed

+119
-0
lines changed

3 files changed

+119
-0
lines changed

roles/cluster_infra/templates/resources.tf.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,9 @@ resource "openstack_blockstorage_volume_v3" "home" {
8282
name = "{{ cluster_name }}-home"
8383
description = "Home for control node"
8484
size = "{{ home_volume_size }}"
85+
{% if use_home_volume_type_fast and home_volume_type_fast is defined %}
86+
volume_type = "{{ home_volume_type_fast }}"
87+
{% endif %}
8588
}
8689

8790

slurm-infra-fast-volume-type.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
slurm-infra.yml
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
name: "slurm"
2+
label: "Slurm"
3+
description: >-
4+
Batch cluster running the Slurm workload manager, the Open
5+
OnDemand web interface, and custom monitoring.
6+
logo: https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Slurm_logo.svg/158px-Slurm_logo.svg.png
7+
8+
parameters:
9+
- name: cluster_floating_ip
10+
label: External IP
11+
description: The external IP to use for the login node.
12+
kind: cloud.ip
13+
immutable: true
14+
15+
- name: compute_count
16+
label: Compute node count
17+
description: The number of compute nodes in the cluster.
18+
kind: integer
19+
options:
20+
min: 1
21+
default: 3
22+
23+
- name: compute_flavor
24+
label: Compute node size
25+
description: The size to use for the compute node.
26+
kind: "cloud.size"
27+
immutable: true
28+
options:
29+
min_ram: 2048
30+
min_disk: 20
31+
32+
- name: home_volume_size
33+
label: Home volume size (GB)
34+
description: The size of the cloud volume to use for home directories
35+
kind: integer
36+
immutable: true
37+
options:
38+
min: 10
39+
default: 100
40+
41+
- name: use_home_volume_type_fast
42+
label: Provision high-performance storage for home directories
43+
description: |
44+
If a high-performance storage type is available to the Slurm platform,
45+
use it for cluster home directories. If no high-performance storage type
46+
is available, this option has no effect and a standard cloud volume will
47+
be provisioned for home directories.
48+
kind: boolean
49+
required: false
50+
default: true
51+
options:
52+
checkboxLabel: Put home directories on high-performance storage?
53+
54+
- name: metrics_db_maximum_size
55+
label: Metrics database size (GB)
56+
description: |
57+
The oldest metrics records in the [Prometheus](https://prometheus.io/) database will be
58+
discarded to ensure that the database does not grow larger than this size.
59+
60+
**A cloud volume of this size +10GB will be created to hold and persist the metrics
61+
database and important Slurm files.**
62+
kind: integer
63+
immutable: true
64+
options:
65+
min: 10
66+
default: 10
67+
68+
- name: cluster_run_validation
69+
label: Post-configuration validation
70+
description: >-
71+
If selected, post-configuration jobs will be executed to validate the core functionality
72+
of the cluster when it is re-configured.
73+
kind: boolean
74+
required: false
75+
default: true
76+
options:
77+
checkboxLabel: Run post-configuration validation?
78+
79+
usage_template: |-
80+
# Accessing the cluster using Open OnDemand
81+
82+
[Open OnDemand](https://openondemand.org/) is a web portal for managing HPC jobs, including graphical
83+
environments such as [Jupyter Notebooks](https://jupyter.org/).
84+
85+
{% if cluster.outputs.openondemand_url %}
86+
The Open OnDemand portal for this cluster is available at
87+
[{{ cluster.outputs.openondemand_url.slice(8) }}]({{ cluster.outputs.openondemand_url }}).
88+
89+
Enter the username `azimuth` and password `{{ cluster.outputs.azimuth_user_password }}` when prompted.
90+
{% else %}
91+
The Open OnDemand portal for this cluster can be accessed from the services list.
92+
{% endif %}
93+
94+
# Accessing the cluster using SSH
95+
96+
The cluster can be accessed over SSH via the external IP. The SSH public key of the user that
97+
deployed the cluster is injected into the `azimuth` user:
98+
99+
```
100+
$ ssh azimuth@{{ cluster.outputs.cluster_access_ip | default('[cluster ip]') }}
101+
[azimuth@{{ cluster.name }}-login-0 ~]$ sinfo
102+
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
103+
compute* up 60-00:00:0 {{ "%3s" | format(cluster.parameter_values.compute_count) }} idle {{ cluster.name }}-compute-[0-{{ cluster.parameter_values.compute_count - 1 }}]
104+
```
105+
106+
SSH access can be granted to additional users by placing their SSH public key in `~azimuth/.ssh/authorized_keys`.
107+
108+
services:
109+
- name: ood
110+
label: Open OnDemand
111+
icon_url: https://github.com/stackhpc/caas-slurm-appliance/raw/main/assets/ood-icon.png
112+
- name: monitoring
113+
label: Monitoring
114+
icon_url: https://raw.githubusercontent.com/cncf/artwork/master/projects/prometheus/icon/color/prometheus-icon-color.png
115+

0 commit comments

Comments
 (0)