Replies: 8 comments
-
Moved this to discussion |
Beta Was this translation helpful? Give feedback.
-
Just to clarify a bit up front, pg_partman is only using the built in partitioning mechanisms in PostgreSQL itself. So it in itself does not contribute to any further memory consumption outside of that. How much more memory a partitioned set would use vs a standard table is really up to the data itself being pulled out and what you are doing with that data. There could be additional overhead for the meta-data of multiple tables being involved and joined. But one of the purposes of partitioning is to reduce the amount of pages that need to be pulled into memory for evaluation using partition pruning. So I would hope that when you do your SELECTs, you set the ideal conditions for that query so that the partition pruning reduces the overall number of tables and pages that need to be evaluated. And that should balance that additional table overhead out, or even improve upon it. If you're not seeing partition pruning, then either the query isn't written right or partitioning itself may not be a suitable solution for the types of queries that you run. PG can let you know if partition pruning is happening and give you a much better idea of how much memory it's using for a given query by doing an EXPLAIN ANALYZE on that query. For any operations that use work_mem, it will give you the exact amounts used for those steps. For shared_buffers it can show you cache hits vs misses to see if maybe you don't have enough memory in the system to contain your average working data set. If you'd like to get a better idea of how much of a given table is being kept in shared buffers, the pg_buffercache extension can give you some insight into that. https://www.postgresql.org/docs/current/pgbuffercache.html Hopefully that helps. If you still have any questions, let me know. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the thorough response Keith, much appreciated! To provide some more context, no partion pruning is happening here because the table is used to run Approximate Nearest Neighbour searches with pgvector, which means that all 730 partitions must be examined on each SELECT. The index is fully kept in memory though (otherwise performance would be atrocious), so my first guess is that the only time PG needs to go to disk is to read the selected attributes from the returned rows (typically these SELECTs return 10-20 rows, so pretty much negligible). I'm not very fond of the internals of Postgres partitioning systems, so I don't know how much extra memory that can add to a regular SELECT that needs to work in all the partitions. Is there a good way to measure that? |
Beta Was this translation helpful? Give feedback.
-
Assuming you're partitioning in order to use the retention system to expire old data then? If not, then there's really no need to be partitioning at all here if you're reading the entire data set anyway on every select. |
Beta Was this translation helpful? Give feedback.
-
Unless you're having issue with maintenance and vacuuming. That could possibly be another reason for partitioning, but I'd only suggest that if you're actually seeing that be a problem. |
Beta Was this translation helpful? Give feedback.
-
There are a couple of reasons why partitioning is very convenient. First one is recall. For my dataset recall drops as partition size grows. For instance, recall for 100K partitions is ~95%, but drops to ~75% for partitions of 1M (go figure with just a non-partitioned table...) So overall Partman is a really convenient and robust solution for this problem. I want to understand why the PostgreSQL background processes have such a large memory footprint, and whether partitioning could be one of the reasons. |
Beta Was this translation helpful? Give feedback.
-
That's a very good use case I hadn't considered. Thank you for sharing! I'd still recommend having a look at the full explain analyzes of the queries and also investigate your shared buffers usage. Beyond that is a bit beyond my skill to investigate further. I can try asking some of my work colleagues to see if they might know. But you may have some luck asking on either the Postgresql general mailing list or even the hackers list. I'd recommend trying general first. |
Beta Was this translation helpful? Give feedback.
-
I just emailed general mailing list, hopefully they can give me some more insights. Thanks a lot for all the assistance Keith! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to understand how much can Partman contribute to memory usage in Postgres backend processes.
My application mainly runs SELECT statements in a partitioned table. The table is partitioned by size (100K rows per partition, 730 partitions in total). However, each Postgres backend process to run these SELECTs takes ~300 MB in memory, which seems a lot compared to what I've seen around.
work_mem is set to 16MB, so nothing too crazy. Also, I am using Pss to measure the memory used by each backend process so that shared_buffers don't mess the whole thing. Here's the output for a couple of them
Is this expected? Or the overhead of Partman partitions should be smaller in terms of memory consumption?
Beta Was this translation helpful? Give feedback.
All reactions