Update the query that fetches roles to be faster with many roles #8457
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes the query that fetches roles from using the pg_catalog.shobj_description function to using an explicit join on pg_catalog.pg_shdescription. In my specific case, in a production system with over 20k roles, this reduces the query time down from 8 minutes(!) down to ~40ms. Just in case it was a locking issue, I did use pg_activity to investigate locks, and nothing was being blocked during the 8 minutes.
This fixes a problem that exists only for certain users, and specifically in my case where the user that runs the query is themselves a member (directly and indirectly) of a high number of roles. Running the existing query as a user that has a low number of roles (and specifically, the master user in an AWS RDS database), the existing query is fairly performant.
My best explanation as to the performance difference comes from the definition of shobj_description https://github.com/postgres/postgres/blob/2a8a00674e973dec4d7d74d5e16c7d7cdedf2be5/src/backend/catalog/system_functions.sql#L303-L312. Each time it's called it runs a query against pg_shdescription and pg_class, and so in my case runs 20k times, compared to the equivalent of once when doing the explicit join on pg_shdescription. It should also be noted that in my case pg_class is quite big - currently about 500k rows (although it has grown to ~5 million in the past, but that was borderline a mistake).
As evidence for the improvement, this is the output of EXPLAIN ANALYZE on the existing query (with no rid)
And this is the output of EXPLAIN ANALYZE of the updated query