Skip to content

Commit 1708dac

Browse files
authored
fix: use the proper views for pgmonitor-extension queries. doc update. new table stat metric (#412)
* fix: use the proper views for pgmonitor-extension queries. doc update. new table stat metric * chore: add changelog fragment
1 parent 9008744 commit 1708dac

File tree

4 files changed

+74
-86
lines changed

4 files changed

+74
-86
lines changed

changelogs/fragments/412.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
bugfixes:
2+
- sql_exporter - use the new views from pgmonitor-extension instead of full queries
3+
- docs - add reference links to upstream configuration docs

hugo/content/prometheus/_index.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,10 @@ Or you can also download [Prometheus](https://prometheus.io/) and [Alertmanager]
3939

4040
##### Minimum Versions
4141

42-
pgMonitor assumes to be using at least Prometheus 2.9.x. We recommend to always use the latest minor version of Prometheus.
42+
pgMonitor has been tested with the following versions at a minimum. Later versions should generally work. If they do not, please open an issue on our Github.
43+
44+
* Prometheus 2.49.1
45+
* Alertmanager 0.26.0
4346

4447
##### User and Configuration Directory Installation
4548

@@ -118,10 +121,10 @@ The below files dictate how Prometheus and Alertmanager will behave at runtime f
118121

119122
| File | Instructions |
120123
|------------------------------------------|--------------|
121-
| /etc/prometheus/crunchy-prometheus.yml | Modify to set scrape interval if different from the default of 30s. Activate alert rules and Alertmanager by uncommenting lines when set as needed. Activate blackbox_exporter monitoring if desired. Service file provided by pgMonitor expects config file to be named "crunchy-prometheus.yml" |
122-
| /etc/prometheus/crunchy-alertmanager.yml | Setup alert target (e.g., SMTP, SMS, etc.), receiver and route information. Service file provided by pgMonitor expects config file to be named "crunchy-alertmanager.yml" |
123-
| /etc/prometheus/alert-ruled.d/crunchy-alert-rules-\*.yml.example | Update rules as needed and remove ".example" suffix. Prometheus config provided by pgmonitor expects ".yml" files to be located in "/etc/prometheus/alert-rules.d/" |
124-
| /etc/prometheus/auto.d/*.yml | You will need at least one file with a final ".yml" extension. Copy the example files to create as many additional targets as needed. Ensure the configuration files you want to use do not end in ".yml.example" but only with ".yml". Note that in order to use the provided Grafana dashboards, the extra "exp_type" label must be applied to all targets and be set appropriately (pg or node). Also, PostgreSQL targets make use of the "cluster_name" variable and should be given a relevant value so all systems (primary & replicas) can be related to each other when needed (Grafana dashboards, etc). See the example target files provided for how to set the labels for postgres or node exporter targets. |
124+
| /etc/prometheus/crunchy-prometheus.yml | Main configuration file for prometheus to set things like scrape intervals and alerting. blackbox_exporter monitoring can also be enabled if desired. Service file provided by pgMonitor expects config file to be named "crunchy-prometheus.yml". For full configration options please see the [Prometheus upstream documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) |
125+
| /etc/prometheus/crunchy-alertmanager.yml | Setup alert target (e.g., SMTP, SMS, etc.), receiver and route information. Service file provided by pgMonitor expects config file to be named "crunchy-alertmanager.yml". For full configuration options please see the [Alertmanager upstream documentation](https://prometheus.io/docs/alerting/latest/configuration/) |
126+
| /etc/prometheus/alert-ruled.d/crunchy-alert-rules-\*.yml.example | Update rules as needed and remove ".example" suffix. Prometheus config provided by pgmonitor expects ".yml" files to be located in "/etc/prometheus/alert-rules.d/". Additional information on configuring alert rules can be found in the [alert rules upstream documentation](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/). |
127+
| /etc/prometheus/auto.d/*.yml | You will need at least one file with a final ".yml" extension. Copy the example files to create as many additional targets as needed. Ensure the configuration files you want to use do not end in ".yml.example" but only with ".yml". Note that in order to use the provided Grafana dashboards, the extra "exp_type" label must be applied to all targets and be set appropriately (pg, node, etcd, pgbouncer, etc). Also, PostgreSQL targets make use of the "cluster_name" variable and should be given a relevant value so all systems (primary & replicas) can be related to each other when needed (Grafana dashboards, etc). See the example target files provided for how to set the labels for postgres or node exporter targets. |
125128

126129
#### Blackbox Exporter
127130

sql_exporter/common/crunchy_global_collector.yml

Lines changed: 52 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -362,32 +362,24 @@ queries:
362362

363363
- query_name: ccp_archive_command_status
364364
query: |
365-
SELECT CASE
366-
WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) IS NULL THEN 0
367-
WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) < 0 THEN 0
368-
ELSE EXTRACT(epoch from (last_failed_time - last_archived_time))
369-
END AS seconds_since_last_fail
370-
, EXTRACT(epoch from (CURRENT_TIMESTAMP - last_archived_time)) AS seconds_since_last_archive
365+
SELECT seconds_since_last_fail
366+
, seconds_since_last_archive
371367
, archived_count
372368
, failed_count
373-
FROM pg_catalog.pg_stat_archiver
369+
FROM pgmonitor_ext.ccp_archive_command_status
374370
375371
376372
- query_name: ccp_connection_stats
377373
query: |
378-
SELECT ((total - idle) - idle_in_txn) AS active
374+
SELECT active
379375
, total
380376
, idle
381377
, idle_in_txn
382-
, (select coalesce(extract(epoch from (max(clock_timestamp() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') AS max_idle_in_txn_time
383-
, (select coalesce(extract(epoch from (max(clock_timestamp() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' AND state NOT LIKE 'idle%' ) AS max_query_time
384-
, (select coalesce(extract(epoch from (max(clock_timestamp() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) AS max_blocked_query_time
378+
, max_idle_in_txn_time
379+
, max_query_time
380+
, max_blocked_query_time
385381
, max_connections
386-
FROM (
387-
SELECT count(*) AS total
388-
, COALESCE(SUM(CASE WHEN state = 'idle' THEN 1 ELSE 0 END),0) AS idle
389-
, COALESCE(SUM(CASE WHEN state = 'idle in transaction' THEN 1 ELSE 0 END),0) AS idle_in_txn FROM pg_catalog.pg_stat_activity) x
390-
JOIN (SELECT setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true)
382+
FROM pgmonitor_ext.ccp_connection_stats
391383
392384
393385
- query_name: ccp_database_size
@@ -399,8 +391,8 @@ queries:
399391
400392
- query_name: ccp_is_in_recovery
401393
query: |
402-
SELECT CASE WHEN pg_is_in_recovery = true THEN 1 ELSE 2 END AS status
403-
FROM pg_is_in_recovery()
394+
SELECT status
395+
FROM pgmonitor_ext.ccp_pg_is_in_recovery
404396
405397
406398
- query_name: ccp_locks
@@ -419,53 +411,48 @@ queries:
419411
- query_name: ccp_pg_settings_checksum
420412
query: |
421413
SELECT pgmonitor_ext.pg_settings_checksum() AS status
422-
414+
423415
424416
- query_name: ccp_postgresql_version
425417
query: |
426-
SELECT current_setting('server_version_num')::int AS current
418+
SELECT current
419+
FROM pgmonitor_ext.ccp_postgresql_version
427420
428421
429422
- query_name: ccp_postmaster_runtime
430423
query: |
431-
SELECT extract('epoch' from pg_postmaster_start_time) as start_time_seconds from pg_catalog.pg_postmaster_start_time()
424+
SELECT start_time_seconds
425+
FROM pgmonitor_ext.ccp_postmaster_runtime
432426
433427
434428
- query_name: ccp_postmaster_uptime
435429
query: |
436-
SELECT extract(epoch from (clock_timestamp() - pg_postmaster_start_time() )) AS seconds
430+
SELECT seconds
431+
FROM pgmonitor_ext.ccp_postmaster_uptime
437432
438433
439434
- query_name: ccp_replication_lag
440435
query: |
441-
SELECT
442-
CASE
443-
WHEN (pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()) OR (pg_is_in_recovery() = false) THEN 0
444-
ELSE EXTRACT (EPOCH FROM clock_timestamp() - pg_last_xact_replay_timestamp())::INTEGER
445-
END
446-
AS replay_time
447-
, CASE
448-
WHEN pg_is_in_recovery() = false THEN 0
449-
ELSE EXTRACT (EPOCH FROM clock_timestamp() - pg_last_xact_replay_timestamp())::INTEGER
450-
END
451-
AS received_time
436+
SELECT replay_time
437+
, received_time
438+
FROM pgmonitor_ext.ccp_replication_lag
452439
453440
454441
- query_name: ccp_replication_lag_size
455442
query: |
456-
SELECT client_addr AS replica
457-
, client_hostname AS replica_hostname
458-
, client_port AS replica_port
459-
, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS bytes
460-
FROM pg_catalog.pg_stat_replication
443+
SELECT replica
444+
, replica_hostname
445+
, replica_port
446+
, bytes
447+
FROM pgmonitor_ext.ccp_replication_lag_size
461448
462449
463450
- query_name: ccp_replication_slots
464451
query: |
465452
SELECT slot_name
466-
, active::int
467-
, pg_wal_lsn_diff(CASE WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn() ELSE pg_current_wal_insert_lsn() END, restart_lsn) AS retained_bytes
468-
FROM pg_catalog.pg_replication_slots
453+
, active
454+
, retained_bytes
455+
FROM pgmonitor_ext.ccp_replication_slots
469456
470457
471458
- query_name: ccp_sequence_exhaustion
@@ -475,7 +462,8 @@ queries:
475462
476463
- query_name: ccp_settings_pending_restart
477464
query: |
478-
SELECT count(*) AS count FROM pg_catalog.pg_settings WHERE pending_restart = true
465+
SELECT count
466+
FROM pgmonitor_ext.ccp_settings_pending_restart
479467
480468
481469
- query_name: ccp_stat_bgwriter
@@ -495,50 +483,33 @@ queries:
495483
496484
- query_name: ccp_stat_database
497485
query: |
498-
SELECT d.datname AS dbname
499-
, s.xact_commit
500-
, s.xact_rollback
501-
, s.blks_read
502-
, s.blks_hit
503-
, s.tup_returned
504-
, s.tup_fetched
505-
, s.tup_inserted
506-
, s.tup_updated
507-
, s.tup_deleted
508-
, s.conflicts
509-
, s.temp_files
510-
, s.temp_bytes
511-
, s.deadlocks
512-
FROM pg_catalog.pg_stat_database s
513-
JOIN pg_catalog.pg_database d ON d.datname = s.datname
514-
WHERE d.datistemplate = false
486+
SELECT dbname
487+
, xact_commit
488+
, xact_rollback
489+
, blks_read
490+
, blks_hit
491+
, tup_returned
492+
, tup_fetched
493+
, tup_inserted
494+
, tup_updated
495+
, tup_deleted
496+
, conflicts
497+
, temp_files
498+
, temp_bytes
499+
, deadlocks
500+
FROM pgmonitor_ext.ccp_stat_database
515501
516502
517503
- query_name: ccp_transaction_wraparound
518504
query: |
519-
WITH max_age AS (
520-
SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age'
521-
)
522-
, per_database_stats AS (
523-
SELECT datname
524-
, m.max_old_xid::int
525-
, m.autovacuum_freeze_max_age::int
526-
, age(d.datfrozenxid) AS oldest_current_xid
527-
FROM pg_catalog.pg_database d
528-
JOIN max_age m ON (true) WHERE d.datallowconn
529-
)
530-
SELECT max(oldest_current_xid) AS oldest_current_xid
531-
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
532-
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac
533-
FROM per_database_stats
534-
505+
SELECT oldest_current_xid
506+
, percent_towards_wraparound
507+
, percent_towards_emergency_autovac
508+
FROM pgmonitor_ext.ccp_transaction_wraparound
509+
535510
536511
- query_name: ccp_wal_activity
537512
query: |
538513
SELECT last_5_min_size_bytes
539-
, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes
540-
FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes
541-
FROM pg_catalog.pg_ls_waldir()
542-
WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x
543-
544-
514+
, total_size_bytes
515+
FROM pgmonitor_ext.ccp_wal_activity

sql_exporter/common/crunchy_per_db_collector.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,16 @@ metrics:
9292
- relname
9393
query_ref: ccp_stat_user_tables
9494

95+
- metric_name: ccp_stat_user_tables_n_tup_newpage_upd
96+
type: gauge
97+
help: "Number of rows updated where the successor version goes onto a new heap page, leaving behind an original version with a t_ctid field that points to a different heap page. These are always non-HOT updates."
98+
values: [n_tup_newpage_upd]
99+
key_labels:
100+
- dbname
101+
- schemaname
102+
- relname
103+
query_ref: ccp_stat_user_tables
104+
95105
- metric_name: ccp_stat_user_tables_n_live_tup
96106
type: gauge
97107
help: "Estimated number of live rows"
@@ -206,6 +216,7 @@ queries:
206216
, n_tup_upd
207217
, n_tup_del
208218
, n_tup_hot_upd
219+
, n_tup_newpage_upd
209220
, n_live_tup
210221
, n_dead_tup
211222
, vacuum_count

0 commit comments

Comments
 (0)