How are you using pg_duckdb? #886

JelteF · 2025-08-14T08:06:32Z

JelteF
Aug 14, 2025
Maintainer

With an open source project it's always hard to know how people are using it. If you're using pg_duckdb in production, could you share here a little bit on how? Also if you're currently evaluating pg_duckdb for some usecase, can you share that? What are the usecases that you're using it for and how well does it work for you?

(Tagging some people that have been active on the repo in the hope that they respond @YuweiXiao @askyx @ggnmstr @saygoodbyye @chestnutsj @sysadminmike @wasd171)

YuweiXiao · 2025-08-15T06:42:32Z

YuweiXiao
Aug 15, 2025

Hey JelteF, I've been involved in the pg_duckdb community for about six months, and really appreciate all the help along the way.

Our main workload reads Parquet files from S3, performs ETL (joins, aggregations), and writes results to Postgres heap tables for ad hoc queries. We rely on:

read_parquet
read_object_store (AWS S3)
write to Postgres tables (Support insert_into_select for Postgres table #688)
force DuckDB execution

To reduce S3 latency, we also implemented a cache layer on top of httpfs, leveraging our existing cache infrastructure.

Initially, we loaded Parquet files into Postgres for updates and ran ETL on heap tables with DuckDB, but scan performance lagged behind. We also tried pg_mooncake v0.1 (Delta Lake), but it's being re-architected and v0.1 is no longer maintained. Now, the data is sit in S3, and analysis is conducted on it in a manner similar to that of a foreign table.

Our pipeline works for daily needs, but we face:

Instability from memory limits applied per backend, not service-wide.
Suboptimal performance: S3 Parquet files are unsorted, limiting filter effectiveness.
SQL usability: Querying is less user-friendly due to "unknown" schemas and 'duckdb row' syntax.

To address these:

We limit concurrent SQL queries, though this isn't foolproof, especially with manual ETL jobs.
We're exploring embedded columnar store support (e.g., DuckLake) to improve performance on unsorted Parquet files. ducklake integration #830
We're considering a foreign table that infers and persists schema as a Postgres foreign table.

In addition to our internal use, we plan to make pg_duckdb available to our Postgres customers (YES, we also offer PG database directly). To achieve this, we are taking extra steps to ensure both security and usability, such as disabling the LocalFileSystem (while still allowing spill to disk), pre-packaging DuckDB extensions and disallowing runtime downloads.

0 replies

adeel-ansari · 2025-08-21T13:29:28Z

adeel-ansari
Aug 21, 2025

Hi @JelteF , we want to use it to query Iceberg tables through postgres. We have a lot of users already using postgres and don't want to migrate to another engine just to use the iceberg tables. Therefore, we're really interested in at least querying the iceberg tables.

0 replies

sysadminmike · 2025-08-21T19:53:06Z

sysadminmike
Aug 21, 2025

Hi we are also using it for accessing iceberg on s3 via postgres - we are using postgres with views to restrict access so the user doesnt need direct access to the s3 bucket - also we can then apply restrictions on what data they can view for example only the last months or a particular category etc

We use it with postgres fdw so we loopback the view into the same postgres database which then allows us to join to a normal postgres table and then import data back into a normal postgres table or just treat the iceberg table as a normal postgres table with the postgres engine. You can update the view to pushdown queries to the duckdb engine as well -we want to look at caching https layer to speed things up but it does work fairly well even without it.

In our setup we have postgres acting as a buffer to collect up changes to then be merged out to iceberg periodically - its possible with the loopback postgres fdw to then join postgres data with the iceberg data to get a single view in postgres of the table in the current state without keeping all the records in postgres - it also allows you to update the records with rules on the view to capture the changes for merging and the join.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How are you using pg_duckdb? #886

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How are you using pg_duckdb? #886

Uh oh!

Uh oh!

JelteF Aug 14, 2025 Maintainer

Replies: 3 comments

Uh oh!

YuweiXiao Aug 15, 2025

Uh oh!

adeel-ansari Aug 21, 2025

Uh oh!

Uh oh!

sysadminmike Aug 21, 2025

JelteF
Aug 14, 2025
Maintainer

YuweiXiao
Aug 15, 2025

adeel-ansari
Aug 21, 2025

sysadminmike
Aug 21, 2025