Update the query that fetches roles to be faster with many roles #8457

michalc · 2025-02-14T08:54:46Z

This changes the query that fetches roles from using the pg_catalog.shobj_description function to using an explicit join on pg_catalog.pg_shdescription. In my specific case, in a production system with over 20k roles, this reduces the query time down from 8 minutes(!) down to ~40ms. Just in case it was a locking issue, I did use pg_activity to investigate locks, and nothing was being blocked during the 8 minutes.

This fixes a problem that exists only for certain users, and specifically in my case where the user that runs the query is themselves a member (directly and indirectly) of a high number of roles. Running the existing query as a user that has a low number of roles (and specifically, the master user in an AWS RDS database), the existing query is fairly performant.

My best explanation as to the performance difference comes from the definition of shobj_description https://github.com/postgres/postgres/blob/2a8a00674e973dec4d7d74d5e16c7d7cdedf2be5/src/backend/catalog/system_functions.sql#L303-L312. Each time it's called it runs a query against pg_shdescription and pg_class, and so in my case runs 20k times, compared to the equivalent of once when doing the explicit join on pg_shdescription. It should also be noted that in my case pg_class is quite big - currently about 500k rows (although it has grown to ~5 million in the past, but that was borderline a mistake).

As evidence for the improvement, this is the output of EXPLAIN ANALYZE on the existing query (with no rid)

"QUERY PLAN"
"Sort  (cost=7463.60..7516.83 rows=21292 width=102) (actual time=532858.481..532861.078 rows=21707 loops=1)"
"  Sort Key: pg_authid.rolcanlogin, pg_authid.rolname"
"  Sort Method: quicksort  Memory: 3638kB"
"  ->  Seq Scan on pg_authid  (cost=0.00..5932.92 rows=21292 width=102) (actual time=26.308..532800.095 rows=21707 loops=1)"
"Planning Time: 0.144 ms"
"Execution Time: 532875.380 ms"

And this is the output of EXPLAIN ANALYZE of the updated query

"QUERY PLAN"
"Sort  (cost=2197.56..2250.79 rows=21292 width=105) (actual time=25.266..27.527 rows=21707 loops=1)"
"  Sort Key: pg_authid.rolcanlogin, pg_authid.rolname"
"  Sort Method: quicksort  Memory: 3638kB"
"  ->  Hash Left Join  (cost=1.05..666.87 rows=21292 width=105) (actual time=0.029..6.835 rows=21707 loops=1)"
"        Hash Cond: (pg_authid.oid = d.objoid)"
"        ->  Seq Scan on pg_authid  (cost=0.00..609.92 rows=21292 width=70) (actual time=0.013..3.813 rows=21707 loops=1)"
"        ->  Hash  (cost=1.04..1.04 rows=1 width=39) (actual time=0.006..0.007 rows=0 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"              ->  Seq Scan on pg_shdescription d  (cost=0.00..1.04 rows=1 width=39) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Filter: (classoid = '12217'::oid)"
"                    Rows Removed by Filter: 3"
"Planning Time: 0.215 ms"
"Execution Time: 41.156 ms"

This changes the query that fetches roles from using the pg_catalog.shobj_description function to using an explicit join on pg_catalog.pg_shdescription. In my specific case, in a production system with over 20k roles, this reduces the query time down from 8 minutes(!) down to ~40ms. Just in case it was a locking issue, I did use pg_activity to investigate locks, and nothing was being blocked during the 8 minutes. This fixes a problem that exists only for certain users, and specifically in my case where the user that runs the query is themselves a member (directly and indirectly) of a high number of roles. Running the existing query as a user that has a low number of roles (and specifically, the master user in an AWS RDS database), the existing query is fairly performant. My best explanation as to the performance difference comes from the definition of shobj_description https://github.com/postgres/postgres/blob/2a8a00674e973dec4d7d74d5e16c7d7cdedf2be5/src/backend/catalog/system_functions.sql#L303-L312. Each time it's called it runs a query against pg_shdescription and pg_class, and so in my case runs 20k times, compared to the equivalent of once when doing the explicit join on pg_shdescription. It should also be noted that in my case pg_class is quite big - currently about 500k rows (although it has grown to ~5 million in the past, but that was borderline a mistake). See pgadmin-org/pgadmin4#8457 (comment) for EXPLAIN ANALYZE results of the existing query and the updated query.

michalc force-pushed the perf/faster-fetching-roles branch 4 times, most recently from 2fb205c to 8d90d1b Compare February 15, 2025 06:16

michalc force-pushed the perf/faster-fetching-roles branch from 8d90d1b to 322d1c1 Compare February 15, 2025 06:25

akshay-joshi merged commit 83ade5d into pgadmin-org:master Feb 17, 2025
27 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the query that fetches roles to be faster with many roles #8457

Update the query that fetches roles to be faster with many roles #8457

Uh oh!

michalc commented Feb 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update the query that fetches roles to be faster with many roles #8457

Update the query that fetches roles to be faster with many roles #8457

Uh oh!

Conversation

michalc commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michalc commented Feb 14, 2025 •

edited

Loading