[+] add Single Query Details dashboard for Prometheus, relates to #1169 by Bishoywadea · Pull Request #1179 · cybertec-postgresql/pgwatch

Bishoywadea · 2026-02-04T19:03:28Z

Add Single Query Details Dashboard for Prometheus

fixes #1169

Description

This PR introduces the Single Query Details dashboard for Prometheus.

Included Panels:

Avg Runtime
Total Runtime
Calls Rate
Shared Buffers Hit Ratio
Temp Blocks Read/Written
Backend Block Read/Write Time
% of Total Time in Direct I/O
SQL Text
Logo Panel

Screenshot

Bishoywadea · 2026-02-04T19:07:54Z

Hi @0xgouda , i have set up a single-query Prometheus dashboard for review, I only included 2 panels for now to make sure the logic is correct before i build the rest.
if this looks good, I will finish the other panels and your suggestions,
I am planning to finish v12 first to move faster, then i will do all of v11 at once. is that okay with you?

0xgouda · 2026-02-05T07:51:42Z

Looks good, please continue.

No need to create the dashboard for v11.

coveralls · 2026-02-05T07:53:14Z

Pull Request Test Coverage Report for Build 22117087494

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
58 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.6%) to 77.58%

Files with Coverage Reduction	New Missed Lines	%
internal/metrics/yaml.go	8	93.28%
internal/sources/yaml.go	11	89.57%
internal/sources/resolver.go	17	78.97%
internal/sinks/prometheus.go	22	75.85%

Totals
Change from base Build 21993433024:	0.6%
Covered Lines:	4277
Relevant Lines:	5513

💛 - Coveralls

Bishoywadea · 2026-02-05T14:59:51Z

@0xgouda i think it is ready now, unless you need to implement new panels other than the already exiting ones in pg dashboard

0xgouda · 2026-02-09T08:38:32Z

@Bishoywadea, Thanks for your time!

Please make the Query ID field text box (that should be fixed in the pg dashboard later as well), no need to add additional querying overhead, the user just types (or get redirected) the query id he wants to inspect
Why do you use increase() not rate() or irate() (just wondering I haven't evaluated yet which is the optimal here)
There is a problem with the PromQL queries, see below, there is 2 Avg runtime axises

use $__rate_interval and remove Aggregattion Interval

Bishoywadea · 2026-02-09T12:48:52Z

Done
from my research i found that increase() gets you the real values aggregated in the time interval unlike rate() which gives you number of calls per second in the time interval aggregated which is not intuitive to read that query is executed 0.4568 per second in the last $aggregated_interval i think is is more clear to read that query has executed 4 times in the last $aggregated_interval in addition to this is what postgres version of it is doing and this is image to show the difference

yes you are right, i didn't notced that bug because it happen in demo database only, i noticed demo database was showing two graph lines instead of one (like two "Avg runtime" lines that looked almost the same). After debugging it, I found that the same queryid can have different query label values sometimes the actual SQL text and sometimes just "-" like the image attached below.

To fix this I wrapped the queries with sum by (dbname, queryid) so everything gets merged into a single line.

Done

Bishoywadea · 2026-02-09T12:53:13Z

@0xgouda i could make a PR for making Query ID field text box in pg dashboard + is it required to replace Aggregation Interval with $__rate_interval in other dashboards or just this one if so i could make it also

0xgouda · 2026-02-10T10:54:01Z

i could make a PR for making Query ID field text box in pg dashboard

I will update the pg dashboard very soon, and then I will fix it.

is it required to replace Aggregation Interval with $__rate_interval in other dashboards or just this one if so i could make it also

yeah we probably should, but this would require careful consideration for the scraping interval and the metric fetch interval so rate()/increases()/irate() can get enough data points and hence show correct results.

Bishoywadea · 2026-02-10T11:17:46Z

yeah we probably should, but this would require careful consideration for the scraping interval and the metric fetch interval so rate()/increases()/irate() can get enough data points and hence show correct results.

hmm, yes i think converting them all will be more complex than i thought, i’ll think about it more and propose a solution, then we could make an issue to convert all of it, but first i think we need something global to coordinate the fetch and scrape time intervals across all the project (i still don't know how just brainstorming with you)

0xgouda · 2026-02-10T11:26:43Z

but first i think we need something global to coordinate the fetch and scrape time intervals across all the project (i still don't know how just brainstorming with you)

There is no global right answer; each metric has its own fetching interval (real-time critical metrics are fetched more frequently, and heavy ones can be fetched up to every 18 hours), so this needs to be adjusted based on the specific metric.

The old way (current one) was to let users specify the aggregation interval but it's not very good.

Actually, I am in the process of refactoring most of the prom dashboards, and I am considering this.

Bishoywadea · 2026-02-10T11:36:15Z

Okay good luck with it, and i will keep my eye on the issues to see if i could help

Bishoywadea · 2026-02-13T00:26:57Z

hi @0xgouda just checking in on the status of this PR i addressed all the previous feedback is there anything else need me to do before this can be merged?

… Prometheus

…oard for Prometheus

…tails dashboard for Prometheus

…Details dashboard for Prometheus

…for Prometheus

…oard for Prometheus

…tom `$agg_interval` and correct average runtime calculation.

0xgouda · 2026-02-13T11:56:01Z

Hi @Bishoywadea

Please remove the ($__rate_interval aggregate) from the panel names
Set a min step for $__rate_interval otherwise it won't show any data most of the times (I would suggest 9m), see below:
You don't have to use sum by (dbname, queryid) in all panels, just using increase() or rate() in all of them should resolve the issue
Update the Query perf analysis (build on top of the latest updates in [+] improve Query Performance Analysis prom dashboard #1193) to include links to this panel for deeper investigation of this query
I guess it's better to use rate() instead of increase() as then we don't have to pay attention to the aggregation interval used by grafana
I don't get why there is a + 0.01 in the Shared buffer Hit Ratio query

Bishoywadea · 2026-02-13T13:12:27Z

I guess it's better to use rate() instead of increase() as then we don't have to pay attention to the aggregation interval used by grafana

ok no problem but do you mean convert increase to rate in all panels or specific one (i mean if we change increase to rate in calls panel this would make different result from the pg dashboard as i clarified early in the comments above)

I don't get why there is a + 0.01 in the Shared buffer Hit Ratio query

this is gaurd to prevent division by zero

0xgouda · 2026-02-13T16:30:14Z

ok no problem but do you mean convert increase to rate in all panels or specific one

Yeah use rate() instead, and there is no problem if its different from the pg dashboard, but we need to explicitly specify that the unit we are using is per second (calls/s, time/s, etc.)

Bishoywadea · 2026-02-13T18:34:15Z

You don't have to use sum by (dbname, queryid) in all panels, just using increase() or rate() in all of them should resolve the issue

@0xgouda related to this comment in the below image shows that querying with queryid and dbname sometimes return more than 1 entry each entry almost have the same numbers that's why i have been using sum() and divide by their number to get the avg to solve the issue that you have pointed out that some panels have 2 lines so i can use either sum() or avg() is that ok to use avg() or no and in case of no do you have solution to that or now why the it return more than 1 entry

0xgouda · 2026-02-13T23:45:37Z

Just use rate() and it will be resolved. Displaying the raw value is not very beneficial anyway; we want to know the average runtime over the aggregation interval instead.

Bishoywadea · 2026-02-15T19:56:11Z

hi @0xgouda sorry for responding late i was busy
honestly i don't get how do you want to use only rate() without sum by() i am sure rate() only won't solve the problem of multiple entities returned by the query but sum() will solve it
if we don't sum valid values from all entities we get wrong results for example if one node does 1000 QPS at 10ms and another does 1 QPS at 1000ms a simple rate logic might give ~505ms which is wrong
so i used weighted average: sum(rate(time)) / sum(rate(calls)) which returns the correct ~11ms which is exactly "the average runtime over the aggregation interval instead"

please reconsider it again or give me additional details like what is the logic do you want the prom query to be ?

0xgouda · 2026-02-15T21:41:46Z

if we don't sum valid values from all entities we get wrong results for example if one node does 1000 QPS at 10ms and another does 1 QPS at 1000ms a simple rate logic might give ~505ms which is wrong

I mostly don't understand your examples
How are we going to have 2 nodes with different QPS for the same query on the same database? rate(pgwatch_stat_statements_calls{dbname='$dbname', queryid='$queryid'}[$__rate_interval])?

pg_stat_statements will already store the fields as counters, so if we have 2 entities one with query=....text... and the other query=-, the above rate() query will put them in a single vector as they have the same dbname and queryid, and then arrange them based on the timestamp and calculate the per-second average, and hence they are counters they build on each other's values, so what's the problem here?

Bishoywadea · 2026-02-16T11:58:23Z

i tried your approach (using rate() directly without aggregation) shown in the first 3 images, but it doesn't seem to solve the issue of multiple lines.
the attached screenshots show what's happening, in prom a unique time series is defined by its labels because the query label is different (one has the SQL text and the other is just "-") prom returns them as two separate entries in the vector
you can see in the graph panel this results in two overlapping lines for the same query id and database name, in this case we have two entities reporting different QPS for that same id if we don't merge them the data looks broken on the dashboard.
to fix this i am using sum by (queryid, dbname) (rate(...))
this lets us use rate() as you suggested but ensures that all entries for that query id are merged into a single line, it also makes sure the "Average Runtime" math stays correct by using the weighted average: sum(rate(time)) / sum(rate(calls)) (the last attached screenshot).

1. change panels ordering 2. don't use `sum by()` 3. add `Execution time per call` panel

0xgouda · 2026-02-17T21:57:04Z

i tried the your approach (using rate() directly without aggregation) shown in the first 3 images, but it doesn't seem to solve the issue of multiple lines.

It works for me; probably there is a problem with your gathered data, let me inspect.

but otherwise I will just add a couple more panels and by tomorrow or so, this should be ready for merging.

Bishoywadea · 2026-02-17T22:03:29Z

this is how i see the dashboard now on my side
that is why i keep telling you rate will not work
i don't know if this is only on my machine or not but i don't think so because i didn't play with any thing other than this panel
overall thanks for bearing with me all these 2 weeks and ramadan kareem😊

0xgouda · 2026-02-17T22:05:39Z

You probably have both stat_statements and stat_statements_no_query_text metrics active at the same time, thats why you get 2 versions for each queryid one with query=- and the other with the actual query text,

because stat_statements_no_query_text is the one that returns query=-

are you using the debug preset? or how are you running pgwatch?

Bishoywadea · 2026-02-17T22:14:37Z

are you using the debug preset? or how are you running pgwatch?

honestly i didn't care much about other options and let all be the default so now i have open metrics presets and found it set on debug

Bishoywadea · 2026-02-17T22:19:57Z

but i think all things are clear now and i got why the other databases are not getting duplicated query id because when i saw the mode of metrics presets of them now it is set on full

0xgouda · 2026-02-17T22:25:01Z

ok that is the problem. So yeah, please use the full preset instead.

But also, this is a problem on our behalf, we shouldn't let the debug preset cause some dashboards to be broken, so I probably have to update it to not include both stat_statements and stat_statements_no_query_text

0xgouda self-assigned this Feb 5, 2026

0xgouda added the dashboards Grafana dashboards related label Feb 5, 2026

0xgouda marked this pull request as draft February 5, 2026 07:52

Bishoywadea marked this pull request as ready for review February 6, 2026 17:47

0xgouda force-pushed the feat/add-single-query-details-prometheus-dashboard branch from 4aff950 to a632cdf Compare February 9, 2026 08:33

Bishoywadea added 9 commits February 13, 2026 13:41

feat(grafana): add Single Query Details dashboard for Prometheus

5aa2b47

feat(grafana): add calls panel for Single Query Details dashboard for…

8e6e338

… Prometheus

feat(grafana): add Total runtime panel for Single Query Details dashb…

be3b09d

…oard for Prometheus

feat(grafana): add Shared Buffers Hit Ratio panel for Single Query De…

d833c49

…tails dashboard for Prometheus

feat(grafana): add "Temp Blocks Read/Written" panel for Single Query …

2f228fe

…Details dashboard for Prometheus

feat(grafana): add "Backend block Read/Write time" Details dashboard …

d179abe

…for Prometheus

feat(grafana): add "% of total_time spent in direct IO" Details dashb…

e01752c

…oard for Prometheus

add logo panel for Single Query Details dashboard for Prometheus

f72c061

refactor: make the dashboard to use $__rate_interval instead of cus…

f1b3af8

…tom `$agg_interval` and correct average runtime calculation.

0xgouda force-pushed the feat/add-single-query-details-prometheus-dashboard branch from 79c40f3 to f1b3af8 Compare February 13, 2026 11:41

Bishoywadea force-pushed the feat/add-single-query-details-prometheus-dashboard branch from 8a1e257 to f1b3af8 Compare February 16, 2026 05:48

refactor: add links to query details + switch metrics to rate()

5e9cfde

0xgouda added 2 commits February 17, 2026 23:39

Update single query details prom dashboard

34fca76

1. change panels ordering 2. don't use `sum by()` 3. add `Execution time per call` panel

Add Shared Buffers Hit/Read per call panel

b8a2fea

Conversation

Bishoywadea commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!