Skip to content

Problem with scraping metrics from prometheus sink. #1003

@artemsafiyulin

Description

@artemsafiyulin

I use pgwatch in docker with prometheus sink

ExecStart=/usr/bin/docker run --rm \
  --name pgwatch \
  --network=host \
  --user=root:root \
  -v /etc/pgwatch/sources.yaml:/sources.yaml:ro \
  -v /etc/pgwatch/metrics.yaml:/metrics.yaml:ro \
  -v /opt/docker/compose/pgwatch/data:/data \
  -v /opt/docker/compose/pgwatch/data/logs:/logs \
  cybertecpostgresql/pgwatch:latest \
  --web-disable \
  --sources=/sources.yaml \
  --metrics=/metrics.yaml \
  --sink=prometheus://:8080 \
  --log-level=info \
  --log-file=/logs/pgwatch.log \
  --log-file-format=text \
  --log-file-rotate \
  --log-file-size=100 \
  --log-file-age=7 \
  --log-file-number=10

In Grafana I see empty spaces in metrics - I will show it based on one metric pgwatch_db_stats_numbackends and one source database
In pgwatch logs I see that metric collects each 60sec as defined

2025-10-27 12:23:43.104 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:24:43.746 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:25:44.398 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:26:45.041 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:27:45.592 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:28:46.164 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:29:46.830 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:30:47.455 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:31:48.097 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:32:48.538 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:33:49.078 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:34:49.829 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:35:50.264 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:36:50.791 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:37:51.322 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:38:51.943 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:39:52.477 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:40:53.001 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:41:53.525 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:42:53.975 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:43:54.516 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:44:55.041 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:45:55.577 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:46:56.102 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:47:56.524 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:48:57.053 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:49:57.581 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched
2025-10-27 12:50:58.105 [INFO] [source:prod-vm-cln-pg-001-001-aws_clweb] [metric:db_stats] [rows:1] [cache:false] measurements fetched

But in Grafana I see this image
Image

Our VictoriaMetrics scrape metrics each 15 sec. I try to check metrics which available on /metrics endpoint with the same interval and run next script to save to file our metric for our source each 15 sec for the same period of time as in logs and on Grafana screenshot.

for i in $(seq 1 100); do
   echo "=== $(date '+%Y-%m-%d %H:%M:%S') (iteration $i) ===" >> /tmp/pgwatch_metrics_test.log; curl -s http://localhost:8080/metrics | grep 'pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb' >> /tmp/pgwatch_metrics_test.log;   echo -e "\n" >> /tmp/pgwatch_metrics_test.log;   sleep 15; 
done

It shows same result as Grafana - the most time there isn't metric in /metrics endpoint (cache)


=== 2025-10-27 12:24:26 (iteration 1) ===


=== 2025-10-27 12:24:42 (iteration 2) ===


=== 2025-10-27 12:24:57 (iteration 3) ===


=== 2025-10-27 12:25:12 (iteration 4) ===


=== 2025-10-27 12:25:27 (iteration 5) ===


=== 2025-10-27 12:25:42 (iteration 6) ===


=== 2025-10-27 12:25:58 (iteration 7) ===


=== 2025-10-27 12:26:13 (iteration 8) ===


=== 2025-10-27 12:26:28 (iteration 9) ===


=== 2025-10-27 12:26:43 (iteration 10) ===


=== 2025-10-27 12:26:58 (iteration 11) ===


=== 2025-10-27 12:27:13 (iteration 12) ===


=== 2025-10-27 12:27:28 (iteration 13) ===


=== 2025-10-27 12:27:43 (iteration 14) ===


=== 2025-10-27 12:27:59 (iteration 15) ===


=== 2025-10-27 12:28:14 (iteration 16) ===


=== 2025-10-27 12:28:29 (iteration 17) ===


=== 2025-10-27 12:28:45 (iteration 18) ===


=== 2025-10-27 12:29:00 (iteration 19) ===


=== 2025-10-27 12:29:15 (iteration 20) ===


=== 2025-10-27 12:29:31 (iteration 21) ===


=== 2025-10-27 12:29:46 (iteration 22) ===


=== 2025-10-27 12:30:01 (iteration 23) ===


=== 2025-10-27 12:30:17 (iteration 24) ===


=== 2025-10-27 12:30:32 (iteration 25) ===


=== 2025-10-27 12:30:48 (iteration 26) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 656 1761568246933


=== 2025-10-27 12:31:03 (iteration 27) ===


=== 2025-10-27 12:31:18 (iteration 28) ===


=== 2025-10-27 12:31:33 (iteration 29) ===


=== 2025-10-27 12:31:48 (iteration 30) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 652 1761568307613


=== 2025-10-27 12:32:04 (iteration 31) ===


=== 2025-10-27 12:32:19 (iteration 32) ===


=== 2025-10-27 12:32:34 (iteration 33) ===


=== 2025-10-27 12:32:49 (iteration 34) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 653 1761568368151


=== 2025-10-27 12:33:05 (iteration 35) ===


=== 2025-10-27 12:33:20 (iteration 36) ===


=== 2025-10-27 12:33:35 (iteration 37) ===


=== 2025-10-27 12:33:50 (iteration 38) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 655 1761568428706


=== 2025-10-27 12:34:05 (iteration 39) ===


=== 2025-10-27 12:34:20 (iteration 40) ===


=== 2025-10-27 12:34:35 (iteration 41) ===


=== 2025-10-27 12:34:51 (iteration 42) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 655 1761568489321


=== 2025-10-27 12:35:06 (iteration 43) ===


=== 2025-10-27 12:38:36 (iteration 57) ===


=== 2025-10-27 12:38:51 (iteration 58) ===
pgwatch_db_stats_numbackends{dbname="prod-vm-cln-pg-001-001-aws_clweb",real_dbname="clweb",sys_id="7483086316555120797"} 648 1761568731576


=== 2025-10-27 12:39:06 (iteration 59) ===


=== 2025-10-27 12:48:54 (iteration 98) ===


=== 2025-10-27 12:49:09 (iteration 99) ===


=== 2025-10-27 12:49:24 (iteration 100) ===

I hide part of empty iterations in log to make it readable.

So, how I see this problem:

  • We collect some metric using pgwatch each 60seconds
  • We scrape metrics from pgwatch each 15seconds
  • But when we scrape metrics there aren't part of values in pgwatch cache

I am not sure that it is BUG, maybe our configuration is not right, so I need help with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions