-
Notifications
You must be signed in to change notification settings - Fork 176
PostgreSQL observ lib #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
PostgreSQL observ lib #1565
Conversation
Dasomeone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortuantely don't have the time to give this the attention it deserves!
I did a visual pass based on your screenshots and the sample app over on integration-sample-apps.
I ran into quite a few instances of no-data for where I know metrics exist (in explore). There may be some incompatibility here, but worth double-checking yourself!
Overall dashboard structure wise I think it looks great. I'm in favour of your adoption of the modular approach, though I think you could make a bit more use of the pre-existing styles in the panels part of the common-lib, rather than overwriting generic each time.
Left a couple other comments, but that's all I have time for right now, sorry!
Also please check linting and jsonnet formatting:
Mixtool:
postgres-observ-lib$ mixtool lint mixin.libsonnet
could not unmarshal lint configuration .lint: EOF
[alert-summary-style] Alert PostgreSQLDown annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL is down'
[alert-summary-style] Alert PostgreSQLHighConnectionUsage annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL connection usage is high'
[alert-summary-style] Alert PostgreSQLLowCacheHitRatio annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL cache hit ratio is low'
[alert-summary-style] Alert PostgreSQLReplicationLag annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL replication lag is high'
[alert-summary-style] Alert PostgreSQLDeadlocks annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL deadlocks detected'
[alert-summary-style] Alert PostgreSQLLongRunningQuery annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has long-running query'
[alert-summary-style] Alert PostgreSQLBlockedQueries annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has blocked queries'
[alert-summary-style] Alert PostgreSQLWALArchiveFailure annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL WAL archiving is failing'
[alert-summary-style] Alert PostgreSQLHighDeadTuples annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL table needs vacuum'
[alert-summary-style] Alert PostgreSQLVacuumNotRunning annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL table has not been vacuumed'
[alert-summary-style] Alert PostgreSQLTooManyRollbacks annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has too many rollbacks'
[alert-summary-style] Alert PostgreSQLTooManyLocksAcquired annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has acquired too many locks'
[alert-summary-style] Alert PostgreSQLInactiveReplicationSlot annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has inactive replication slot'
[alert-summary-style] Alert PostgreSQLReplicationRoleChanged annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL replication role changed'
[alert-summary-style] Alert PostgreSQLExporterErrors annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL exporter has errors'
[alert-name-camelcase] Alert 'PostgreSQLHighQPS' name is not in camel case
[alert-summary-style] Alert PostgreSQLHighQPS annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL has high QPS'
[alert-summary-style] Alert PostgreSQLReplicationLagCritical annotation 'summary' must start with capital letter and end with period, is currently 'PostgreSQL replication lag exceeds 1 hour'
failed to load the dashboard-linter config file .lint: could not unmarshal lint configuration .lint: EOF
failed to load the dashboard-linter config file .lint: could not unmarshal lint configuration .lint: EOF
failed to load the dashboard-linter config file .lint: could not unmarshal lint configuration .lint: EOF
2025/12/05 17:47:07 failed to lint the file mixin.libsonnet: 22 lint errors found| expr: 'pg_stat_statements_seconds_total{job=~"$job",cluster=~"$cluster",instance=~"$instance"}', | ||
| format: 'table', | ||
| instant: true, | ||
| refId: 'TotalTime', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting cut off in your screenshot, just something to beware of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getting cut on the screenshot concerning in terms of smaller windows? It works well in mine, but I have a 34 inches monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well in my 14 inches notebook screen as well.
| unit: 'rows/s', | ||
| sources: { | ||
| postgres_exporter: { | ||
| expr: 'topk(10, rate(pg_stat_statements_rows_total{%(queriesSelector)s}[$__rate_interval]))', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a selector for k (template variable) would probably be a good improvement here!
Additionally since we're looking at up to 10 series here, we should consider a right-aligned table, that way you can also look at mean/max, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea will do. Not sure what you mean with "that way you can also look at mean/max, etc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| { | ||
| // Cluster overview dashboard - Top-level view of the entire cluster | ||
| 'postgres-cluster.json': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing with the sample-app I am getting data for ~ half the panels here.
I've not checked all the metrics, but I know that for example pg_up is present but not loading correctly on the overview dashboard with your queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be because the sample-app is a standalone instance and probably missing the cluster label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be missing some required collectors which are not enabled by default.
I can look into the sample-app to sync it too.
|
@Dasomeone I have shifted to use the common-lib panels where appropriate, and made some other changes according to your review suggestions. Here is how the dashboards look like now. |




This is adding a PostgreSQL observ lib. It is currently implementing the Prometheus potsgres_exporter.
Composed of these dashboards:
Cluster Overview dashboard for cluster stats at a glance

Overview dashboard for instance drilldown

Query Overview for query analysis

Also packs the following alerts:
|
PostgreSQLDown| pg_up == 0 | critical ||
PostgreSQLHighConnectionUsage| >80% | warning ||
PostgreSQLLowCacheHitRatio| <90% | warning ||
PostgreSQLReplicationLag| >30s | warning ||
PostgreSQLReplicationLagCritical| >1h | critical ||
PostgreSQLDeadlocks| any | warning ||
PostgreSQLLongRunningQuery| >5min | warning ||
PostgreSQLBlockedQueries| any | warning ||
PostgreSQLWALArchiveFailure| any | critical ||
PostgreSQLHighDeadTuples| >10% | warning ||
PostgreSQLVacuumNotRunning| >7 days | warning ||
PostgreSQLTooManyRollbacks| >10% | warning ||
PostgreSQLTooManyLocksAcquired| >20% | warning ||
PostgreSQLInactiveReplicationSlot| any | critical ||
PostgreSQLReplicationRoleChanged| any | warning ||
PostgreSQLExporterErrors| any | critical ||
PostgreSQLHighQPS| >10000 | warning |