Skip to content

Commit d712f69

Browse files
committed
updated docs
1 parent b2b673f commit d712f69

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed

docs/monitoring-and-logging.README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,10 @@ The port can customised by overriding the `prometheus_port` variable.
228228

229229
Note that this service is not password protected, allowing anyone with access to the URL to make queries.
230230

231+
### Upgrades
232+
233+
The appliance previously used [cloudalchemy.prometheus](https://github.com/cloudalchemy/ansible-prometheus) role to configure Prometheus, but our monitoring stack has since been moved into the [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart running on a k3s cluster. The some of the default Grafana dashboards deployed by kube-prometheus-stack are hardcoded to rely on the `job` label of metrics scraped from Node Exporter to have the value `node-exporter`. By default, the cloudalchemy role scraped these metrics with the `job` label set to `node`. Therefore, if upgrading from previous versions of the appliance which used the cloudalchemy role, pre-upgrade data will not show up by default in Grafana dashboards. The old data can still be viewed in the OpenHPC and Node Exporter Slurm dashboards by selecting the previous `job` value from the Job dropdown.
234+
231235
### Alerting and recording rules
232236

233237
See the upstream documentation for [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) and [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) rules.

docs/persistent-state.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@ At present this will affect the following:
66
- `slurmctld` state, i.e. the Slurm queue.
77
- The MySQL database for `slurmdbd`, i.e. Slurm accounting information as shown by the `sacct` command.
88
- Prometheus database
9-
- Grafana data
10-
- OpenDistro/elasticsearch data
9+
- OpenSearch data
1110

1211
If using the `environments/common/layout/everything` Ansible groups template (which is the default for a new cookiecutter-produced environment) then these services will all be on the `control` node and hence only this node requires persistent storage.
1312

0 commit comments

Comments
 (0)