Replies: 3 comments
-
Hey JelteF, I've been involved in the pg_duckdb community for about six months, and really appreciate all the help along the way. Our main workload reads Parquet files from S3, performs ETL (joins, aggregations), and writes results to Postgres heap tables for ad hoc queries. We rely on:
To reduce S3 latency, we also implemented a cache layer on top of httpfs, leveraging our existing cache infrastructure. Initially, we loaded Parquet files into Postgres for updates and ran ETL on heap tables with DuckDB, but scan performance lagged behind. We also tried pg_mooncake v0.1 (Delta Lake), but it's being re-architected and v0.1 is no longer maintained. Now, the data is sit in S3, and analysis is conducted on it in a manner similar to that of a foreign table. Our pipeline works for daily needs, but we face:
To address these:
In addition to our internal use, we plan to make |
Beta Was this translation helpful? Give feedback.
-
Hi @JelteF , we want to use it to query Iceberg tables through postgres. We have a lot of users already using postgres and don't want to migrate to another engine just to use the iceberg tables. Therefore, we're really interested in at least querying the iceberg tables. |
Beta Was this translation helpful? Give feedback.
-
Hi we are also using it for accessing iceberg on s3 via postgres - we are using postgres with views to restrict access so the user doesnt need direct access to the s3 bucket - also we can then apply restrictions on what data they can view for example only the last months or a particular category etc We use it with postgres fdw so we loopback the view into the same postgres database which then allows us to join to a normal postgres table and then import data back into a normal postgres table or just treat the iceberg table as a normal postgres table with the postgres engine. You can update the view to pushdown queries to the duckdb engine as well -we want to look at caching https layer to speed things up but it does work fairly well even without it. In our setup we have postgres acting as a buffer to collect up changes to then be merged out to iceberg periodically - its possible with the loopback postgres fdw to then join postgres data with the iceberg data to get a single view in postgres of the table in the current state without keeping all the records in postgres - it also allows you to update the records with rules on the view to capture the changes for merging and the join. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
With an open source project it's always hard to know how people are using it. If you're using pg_duckdb in production, could you share here a little bit on how? Also if you're currently evaluating pg_duckdb for some usecase, can you share that? What are the usecases that you're using it for and how well does it work for you?
(Tagging some people that have been active on the repo in the hope that they respond @YuweiXiao @askyx @ggnmstr @saygoodbyye @chestnutsj @sysadminmike @wasd171)
Beta Was this translation helpful? Give feedback.
All reactions