Skip to content

pg_exporter_last_scrape_error returns 1 on replica unit of the standby cluster when async-replication is enabled #1290

@ggouzi

Description

@ggouzi

Steps to reproduce

  • 2 Postgresql clusters of 2 units each
  • Enable async-replication with the create-replication action
  • Enable monitoring with grafana-agent subordinates
  • Check alerts in prometheus (COS)

Expected behavior

Alert PostgresqlExporterError is not firing

Actual behavior

Alert PostgresqlExporterError is firing
Image

Indeed the pg_exporter_last_scrape_error returns 1 on the replica unit of the standby cluster

curl -sS http://127.0.0.1:9187/metrics | grep pg_exporter_last_scrape_error
# HELP pg_exporter_last_scrape_error Whether the last scrape of metrics from PostgreSQL resulted in an error (1 for error, 0 for success).
# TYPE pg_exporter_last_scrape_error gauge
pg_exporter_last_scrape_error 1

The curl command above shows there is no scrape error.

Versions

Juju 3.6.11
OS 24.04 Noble
Postgresql machine charm: 16/stable rev 952

Additional context

This issue seems to be well known at upstream level: prometheus-community/postgres_exporter#957

https://discuss.prometheus.io/t/pg-exporter-last-scrape-error-provides-1-on-crunchy-cluster-replica/1981

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions