Skip to content

Conversation

@michalc
Copy link
Contributor

@michalc michalc commented Feb 14, 2025

This changes the query that fetches roles from using the pg_catalog.shobj_description function to using an explicit join on pg_catalog.pg_shdescription. In my specific case, in a production system with over 20k roles, this reduces the query time down from 8 minutes(!) down to ~40ms. Just in case it was a locking issue, I did use pg_activity to investigate locks, and nothing was being blocked during the 8 minutes.

This fixes a problem that exists only for certain users, and specifically in my case where the user that runs the query is themselves a member (directly and indirectly) of a high number of roles. Running the existing query as a user that has a low number of roles (and specifically, the master user in an AWS RDS database), the existing query is fairly performant.

My best explanation as to the performance difference comes from the definition of shobj_description https://github.com/postgres/postgres/blob/2a8a00674e973dec4d7d74d5e16c7d7cdedf2be5/src/backend/catalog/system_functions.sql#L303-L312. Each time it's called it runs a query against pg_shdescription and pg_class, and so in my case runs 20k times, compared to the equivalent of once when doing the explicit join on pg_shdescription. It should also be noted that in my case pg_class is quite big - currently about 500k rows (although it has grown to ~5 million in the past, but that was borderline a mistake).

As evidence for the improvement, this is the output of EXPLAIN ANALYZE on the existing query (with no rid)

"QUERY PLAN"
"Sort  (cost=7463.60..7516.83 rows=21292 width=102) (actual time=532858.481..532861.078 rows=21707 loops=1)"
"  Sort Key: pg_authid.rolcanlogin, pg_authid.rolname"
"  Sort Method: quicksort  Memory: 3638kB"
"  ->  Seq Scan on pg_authid  (cost=0.00..5932.92 rows=21292 width=102) (actual time=26.308..532800.095 rows=21707 loops=1)"
"Planning Time: 0.144 ms"
"Execution Time: 532875.380 ms"

And this is the output of EXPLAIN ANALYZE of the updated query

"QUERY PLAN"
"Sort  (cost=2197.56..2250.79 rows=21292 width=105) (actual time=25.266..27.527 rows=21707 loops=1)"
"  Sort Key: pg_authid.rolcanlogin, pg_authid.rolname"
"  Sort Method: quicksort  Memory: 3638kB"
"  ->  Hash Left Join  (cost=1.05..666.87 rows=21292 width=105) (actual time=0.029..6.835 rows=21707 loops=1)"
"        Hash Cond: (pg_authid.oid = d.objoid)"
"        ->  Seq Scan on pg_authid  (cost=0.00..609.92 rows=21292 width=70) (actual time=0.013..3.813 rows=21707 loops=1)"
"        ->  Hash  (cost=1.04..1.04 rows=1 width=39) (actual time=0.006..0.007 rows=0 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"              ->  Seq Scan on pg_shdescription d  (cost=0.00..1.04 rows=1 width=39) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Filter: (classoid = '12217'::oid)"
"                    Rows Removed by Filter: 3"
"Planning Time: 0.215 ms"
"Execution Time: 41.156 ms"

@michalc michalc force-pushed the perf/faster-fetching-roles branch 4 times, most recently from 2fb205c to 8d90d1b Compare February 15, 2025 06:16
This changes the query that fetches roles from using the
pg_catalog.shobj_description function to using an explicit join on
pg_catalog.pg_shdescription. In my specific case, in a production system with
over 20k roles, this reduces the query time down from 8 minutes(!) down to
~40ms. Just in case it was a locking issue, I did use pg_activity to
investigate locks, and nothing was being blocked during the 8 minutes.

This fixes a problem that exists only for certain users, and specifically in my
case where the user that runs the query is themselves a member (directly and
indirectly) of a high number of roles. Running the existing query as a user
that has a low number of roles (and specifically, the master user in an AWS RDS
database), the existing query is fairly performant.

My best explanation as to the performance difference comes from the definition
of shobj_description
https://github.com/postgres/postgres/blob/2a8a00674e973dec4d7d74d5e16c7d7cdedf2be5/src/backend/catalog/system_functions.sql#L303-L312.
Each time it's called it runs a query against pg_shdescription and pg_class,
and so in my case runs 20k times, compared to the equivalent of once when doing
the explicit join on pg_shdescription. It should also be noted that in my case
pg_class is quite big - currently about 500k rows (although it has grown to ~5
million in the past, but that was borderline a mistake).

See pgadmin-org/pgadmin4#8457 (comment) for
EXPLAIN ANALYZE results of the existing query and the updated query.
@michalc michalc force-pushed the perf/faster-fetching-roles branch from 8d90d1b to 322d1c1 Compare February 15, 2025 06:25
@akshay-joshi akshay-joshi merged commit 83ade5d into pgadmin-org:master Feb 17, 2025
27 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants