Skip to content

Commit 80a0e21

Browse files
committed
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-slurm-appliance into feature/k3s-monitoring
2 parents cd281f3 + 774f608 commit 80a0e21

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

docs/monitoring-and-logging.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Components overview
44

55
### [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
6-
An umbrella Helm chart which the appliance uses to deploy and manages containerised versions of Prometheus, Grafana, Alertmanager and Node Exporter.
6+
An umbrella Helm chart which the appliance uses to deploy and manage containerised versions of Prometheus, Grafana, Alertmanager and Node Exporter.
77

88
### [filebeat](https://www.elastic.co/beats/filebeat)
99

@@ -109,7 +109,7 @@ Note that if Open OnDemand is enabled, Grafana is only accessible through OOD's
109109

110110
### grafana dashboards
111111

112-
In addition to the default set of dashboards that are deployed by kube-prometheus-stack, the appliance ships with a default set of dashboards (listed below). The set of appliance-specific dashboards can be configured via the `grafana_dashboards` variable. The dashboards are either internal to the [grafana-dashboards role](../ansible/roles/grafana-dashboards/files/) or downloaded from grafana.com. If you wish to selectively remove the default dashboards deployed by kube-prometheus-stack, this can be done by overriding the `grafana_exclude_default_dashboards` variable in [environments/common/inventory/group_vars/all/grafana.yml](../environments/common/inventory/group_vars/all/grafana.yml).
112+
In addition to the default set of dashboards that are deployed by kube-prometheus-stack, the appliance ships with additional dashboards listed below. The set of appliance-specific dashboards can be configured via the `grafana_dashboards` variable. The dashboards are either internal to the [grafana-dashboards role](../ansible/roles/grafana-dashboards/files/) or downloaded from grafana.com. If you wish to selectively remove the default dashboards deployed by kube-prometheus-stack, this can be done by overriding the `grafana_exclude_default_dashboards` variable in [environments/common/inventory/group_vars/all/grafana.yml](../environments/common/inventory/group_vars/all/grafana.yml).
113113

114114
#### node exporter slurm
115115

@@ -230,7 +230,7 @@ Note that this service is not password protected, allowing anyone with access to
230230

231231
### Upgrades
232232

233-
The appliance previously used [cloudalchemy.prometheus](https://github.com/cloudalchemy/ansible-prometheus) role to configure Prometheus, but our monitoring stack has since been moved into the [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart running on a k3s cluster. The some of the default Grafana dashboards deployed by kube-prometheus-stack are hardcoded to rely on the `job` label of metrics scraped from Node Exporter to have the value `node-exporter`. By default, the cloudalchemy role scraped these metrics with the `job` label set to `node`. Therefore, if upgrading from previous versions of the appliance which used the cloudalchemy role, pre-upgrade data will not show up by default in Grafana dashboards. The old data can still be viewed in the OpenHPC and Node Exporter Slurm dashboards by selecting the previous `job` value from the Job dropdown.
233+
The appliance previously used [cloudalchemy.prometheus](https://github.com/cloudalchemy/ansible-prometheus) role to configure Prometheus, but our monitoring stack has since been moved into the [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart running on a k3s cluster. Some of the default Grafana dashboards deployed by kube-prometheus-stack are hardcoded to rely on the `job` label of metrics scraped from Node Exporter to have the value `node-exporter`. By default, the cloudalchemy role scraped these metrics with the `job` label set to `node`. Therefore, if upgrading from previous versions of the appliance which used the cloudalchemy role, pre-upgrade data will not show up by default in Grafana dashboards. The old data can still be viewed in the OpenHPC and Node Exporter Slurm dashboards by selecting the previous `job` value from the Job dropdown.
234234

235235
### Alerting and recording rules
236236

environments/common/inventory/group_vars/all/alertmanager.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
alertmanager_image_tag: v0.27.0
1+
alertmanager_image_tag: 'v0.27.0'
22

33
alertmanager_config:
44
route:

0 commit comments

Comments
 (0)