Node atttributes and Systemd units data not showing up in Grafana #184

pvaldria · 2021-02-09T01:45:28Z

pvaldria
Feb 9, 2021

Node atttributes and Systemd units data not showing up in Grafana. Please see attached screenshot. Is it a known issue ?
I have a pacemaker/corosync NFS HA cluster (active/passive) with shared disk and using SBD fencing agent.

I had to add the below to /etc/prometheus/prometheus.yml
`

job_name: 'nfs-ha'
scrape_interval: 5s
static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9664', 'nfs-server-2.storage.nfs.oraclevcn.com:9664', 'qdevice.storage.nfs.oraclevcn.com:9664', 'nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9100']
labels:
group: 'nfs-ha'
`

I installed ha_cluster_exporter using the steps below.

`
yum install -y -q git
curl -O https://objectstorage.us-ashburn-1.oraclecloud.com/xxxxxxxxxxxxxxx/go1.15.8.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.15.8.linux-amd64.tar.gz

echo '
export GOROOT="/usr/local/go"
export GOBIN="$HOME/go/bin"
mkdir -p $GOBIN
export PATH=$PATH:$GOROOT/bin:$GOBIN
' >> .bashrc
source ~/.bashrc
go version
go get github.com/golang/mock/mockgen

git clone https://github.com/ClusterLabs/ha_cluster_exporter
cd ha_cluster_exporter
make
make install

cat > /lib/systemd/system/ha_cluster_exporter.service << EOF
[Unit]
Description=Prometheus exporter for Pacemaker HA clusters metrics
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/root/go/bin/ha_cluster_exporter
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF

systemctl start ha_cluster_exporter
`

pvaldria · 2021-02-09T01:56:26Z

pvaldria
Feb 9, 2021
Author

More details:

Feb 7 12:15:10 nfs-server-2 systemd: Started Prometheus exporter for Pacemaker HA clusters metrics.

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=warning msg="Config File "ha_cluster_exporter" Not Foun
d in "[/ /.config /etc /usr/etc]""

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="Default config values will be used"

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=warning msg="Registration failure: could not initialize '
drbd' collector: '/sbin/drbdsetup' does not exist"

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'pacemaker' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'corosync' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'sbd' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="Serving metrics on 0.0.0.0:9664"

0 replies

diegoakechi · 2021-02-09T08:15:42Z

diegoakechi
Feb 9, 2021
Maintainer

Hi @pvaldria,

Systemd and other OS related metrics are provided by the Prometheus Node_exporter. Do you have it running on your system too? The ha_cluster_exporter is specialized to provide Clusterlabs components metrics.

0 replies

pvaldria · 2021-02-09T08:20:11Z

pvaldria
Feb 9, 2021
Author

yes, I have the node_exporter service running on all nodes and on the Grafana/Prometheus server, I have the following:

The last job below ( - job_name: 'nfs-ha-cluster') I added for displaying HA details and I mentioned both port 9664 and port 9100.

`
/etc/prometheus/prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
region: region
monitor: infrastructure
replica: nfs-20210209-0706

alerting:
alertmanagers:

static_configs:
- targets:

rule_files:

scrape_configs:

job_name: 'prometheus'

static_configs:
- targets: ['localhost:9090']
job_name: 'nfs_servers'

scrape_interval: 5s
static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100']
  labels:
  group: 'nfs_servers'
job_name: 'quorum'

scrape_interval: 5s
static_configs:
- targets: ['qdevice.storage.nfs.oraclevcn.com:9100']
  labels:
  group: 'quorum'
job_name: 'nfs-ha-cluster'

scrape_interval: 5s
static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9664', 'nfs-server-2.storage.nfs.oraclevcn.com:9664', 'qdevice.storage.nfs.oraclevcn.com:9664', 'nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9100']
  labels:
  group: 'nfs-ha-cluster'
  `

0 replies

diegoakechi · 2021-02-09T10:22:44Z

diegoakechi
Feb 9, 2021
Maintainer

@pvaldria another check: Did you enable systemd collector on your node_exporter configuration? It comes disabled by default.

https://github.com/prometheus/node_exporter#disabled-by-default

0 replies

pvaldria · 2021-02-09T11:16:34Z

pvaldria
Feb 9, 2021
Author

Thanks Diego. That helps.

I made the below change - /opt/node_exporter-1.0.1.linux-amd64/node_exporter --collector.systemd and it works. But I see duplicate lines in the output. (screenshot attached) - There are 2 pacemaker on nfs-server-1 and nfs-server-2. Similarly for other services.

1 reply

stefanotorresi Feb 9, 2021
Maintainer

I think that's because you're scraping the node_exporter twice; in your prometheus config, I can see duplicate targets in nfs_servers and nfs-ha-cluster jobs.

pvaldria · 2021-02-09T11:49:28Z

pvaldria
Feb 9, 2021
Author

Grafana is new for me, appreciate your patience while helping me.

I have 2 dashboards -

To display Infrastructure details (CPU, memory, etc) . I created job_name "nfs_servers", so I can see metrics available from node_exporter and listed the 2 NFS server nodes with port 9100
ClusterLabs HA dashboard. I created another job_name "nfs-ha-cluster", so I can display all HA metrics and since it requires node_exporter also, I assumed I need to list all nodes with their respective ports : hence, host:9100, host:9664. If I remove host:9100 from the target for job_name "nfs-ha-cluster", then data is not displayed in ClusterLabs HA dashboard.

job_name "nfs-ha-cluster"

targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9664', 'nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9664', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9664', 'qdevice.storage.nfs.oraclevcn.com:9100']

So are you saying I should completely remove the below , if yes, will Dashboard #1 still work ?

- job_name: 'nfs_servers' scrape_interval: 5s static_configs: - targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100'] labels: group: 'nfs_servers'

2 replies

stefanotorresi Feb 9, 2021
Maintainer

The way we conventionally set the dashboards to work is that they expect one scrape job per cluster.
If all the nodes you're trying to monitor are part of the same logical cluster group, this means that you need to group all the targets under the same job. The Prometheus scrape jobs don't have any relationship with the number of dashboards, but rather the dashboards will use the job label to let the user select which cluster to display data for, in case of multiple clusters handled by the same Prometheus instance.

userthirtytwo Sep 12, 2022

This was my breakthrough, with your comment mentioning that all targets need to be under the same job:
- targets: ['192.168.1.10:9664','192.168.1.11:9664','192.168.1.10:9100','192.168.74.11:9100']

Thanks!

Node atttributes and Systemd units data not showing up in Grafana #184

Uh oh!

Uh oh!

pvaldria Feb 9, 2021

Replies: 6 comments · 3 replies

Uh oh!

Uh oh!

pvaldria Feb 9, 2021 Author

Uh oh!

diegoakechi Feb 9, 2021 Maintainer

Uh oh!

Uh oh!

pvaldria Feb 9, 2021 Author

Uh oh!

diegoakechi Feb 9, 2021 Maintainer

Uh oh!

pvaldria Feb 9, 2021 Author

Uh oh!

stefanotorresi Feb 9, 2021 Maintainer

Uh oh!

Uh oh!

pvaldria Feb 9, 2021 Author

Uh oh!

stefanotorresi Feb 9, 2021 Maintainer

Uh oh!

userthirtytwo Sep 12, 2022

pvaldria
Feb 9, 2021

Replies: 6 comments 3 replies

pvaldria
Feb 9, 2021
Author

diegoakechi
Feb 9, 2021
Maintainer

pvaldria
Feb 9, 2021
Author

diegoakechi
Feb 9, 2021
Maintainer

pvaldria
Feb 9, 2021
Author

stefanotorresi Feb 9, 2021
Maintainer

pvaldria
Feb 9, 2021
Author

stefanotorresi Feb 9, 2021
Maintainer