pg_ducklake

PostgreSQL Extension for DuckLake

This project is under high development and is not yet ready for production use.

pg_ducklake brings a native datalake experience into PostgreSQL, powered by DuckLake (a DuckDB lakehouse format with SQL catalog metadata and open Parquet data files).

Key Features

Managed DuckLake tables: Create/Write/Query DuckLake tables in PostgreSQL via SQL (e.g., psql/JDBC).
DuckDB compatibility: tables created by pg_ducklake are directly queryable from DuckDB clients.
Cloud storage: store data files in AWS S3 (or GCS, R2) to decouple storage and compute for serverless analytics.
Fast analytics: columnar storage + DuckDB vectorized execution, with hybrid queries over PostgreSQL heap tables supported.

See it in action

Your first Data Lake in PostgreSQL

-- Or use AWS S3 as data storage.
-- SET ducklake.default_table_path = 's3://my-bucket/prefix/';

CREATE TABLE my_table (
    id INT,
    name TEXT,
    age INT
) USING ducklake;

INSERT INTO my_table VALUES (1, 'Alice', 25), (2, 'Bob', 30);

SELECT * FROM my_table;

Access your data with DuckDB

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:postgres:dbname=postgres host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake');
SELECT * FROM my_ducklake.public.my_table;

For cloud storage (AWS S3 or Azure Blob Storage), see the Secrets Management guide.

Quick Start

Docker

Run PostgreSQL with pg_ducklake pre-installed in a docker container:

docker run -d -e POSTGRES_PASSWORD=duckdb --name pgducklake pgducklake/pgducklake:18-main

docker exec -it pgducklake psql

Compile from source

Requirements:

PostgreSQL: 14, 15, 16, 17, 18
Operating Systems: Ubuntu 22.04-24.04, macOS

git clone https://github.com/relytcloud/pg_ducklake
cd pg_ducklake
make install

See compilation guide for detailed instructions.

Usecases

Convert a PostgreSQL heap table into a DuckLake table

This pattern performs a one-time ETL copy from row-store (PostgreSQL heap) tables to DuckLake (column-store) tables for fast analytics, while OLTP continues to use the original heap tables.

-- Create a PostgreSQL row-store (heap) table.
CREATE TABLE row_store_table AS
SELECT i AS id, 'hello pg_ducklake' AS msg
FROM generate_series(1, 10000) AS i;

-- Create a DuckLake column-store table via ETL.
CREATE TABLE col_store_table USING ducklake AS
SELECT *
FROM row_store_table;

-- Run analytics against the converted table.
SELECT max(id) FROM col_store_table;

Load an external dataset

External datasets (e.g., CSV/Parquet) can be ingested with DuckDB readers and materialized as tables for analytics.

CREATE TABLE titanic USING ducklake AS
SELECT * FROM read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv');

SELECT "Pclass", "Sex", COUNT(*), AVG("Survived") AS survival_rate
FROM titanic
GROUP BY "Pclass", "Sex";

Query DuckDB-created DuckLake tables from PostgreSQL

DuckLake tables created by DuckDB clients (CLI, Python, etc.) that use the current PostgreSQL instance as their metadata catalog can be queried directly from PostgreSQL using the Foreign Data Wrapper (FDW). This enables scenarios where data engineers create and manage tables via DuckDB tools, while application developers query them seamlessly through PostgreSQL.

Example scenario: A data pipeline uses DuckDB CLI to create DuckLake tables from Parquet files in S3, storing metadata in PostgreSQL. Your PostgreSQL application can then query these tables using standard SQL without needing DuckDB clients.

See the DuckLake FDW User Guide for setup and usage details.

Documentation

DuckLake Secrets Management - Configure credentials for AWS S3 and Azure Blob Storage
Table Maintenance (VACUUM) - Guide on using VACUUM for file compaction and garbage collection
DuckLake Time Travel - Query historical snapshots

Roadmap

Features

CREATE / CREATE_TABLE_AS for DuckLake tables
INSERT / SELECT / DELETE / UPDATE for DuckLake tables
Online schema evolution (ADD COLUMN / DROP COLUMN / type promotion)
Time-travel queries
Partitioned tables
Read-only pg_ducklake tables referencing shared DuckLake datasets (e.g., frozen DuckLake)
Table maintenance (e.g., compaction / GC) via PostgreSQL (e.g., VACUUM or UDFs) [^]
HTAP support for incremental row-store → column-store conversion (PostgreSQL heap → DuckLake)
Complex types

[^]: Table maintenance can be carried out by standalone DuckDB clients (preferable, since it is serverless and avoids burdening the PostgreSQL server); pg_ducklake still plans to expose these operations for ease of use.

Performance

Native inlined (heap) table for small writes
Better transaction concurrency model (based on PostgreSQL XID)
Faster metadata operations via PostgreSQL native functions (e.g., SPI)

Docs

Access control behavior for DuckLake tables [^]

[^]: DuckLake tables are exposed via PostgreSQL table access methods (AM), so PostgreSQL table/column privileges may already apply; the current behavior and gaps will be reviewed and documented. DuckLake itself relies on its metadata service for ACL management.

Contributing

We welcome contributions! Please see:

Contributing Guidelines
Issues for bug reports

Acknowledgments

This project is forked from pg_duckdb, and integrating with ducklake.

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.github		.github
docker		docker
docs		docs
include/pgduckdb		include/pgduckdb
scripts		scripts
sql		sql
src		src
test		test
third_party		third_party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.global		Makefile.global
NOTICE		NOTICE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
docker-bake.hcl		docker-bake.hcl
docker-compose.yml		docker-compose.yml
logo-dark.svg		logo-dark.svg
logo-light.svg		logo-light.svg
pg_duckdb.control		pg_duckdb.control
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_ducklake

Key Features

See it in action

Your first Data Lake in PostgreSQL

Access your data with DuckDB

Quick Start

Docker

Compile from source

Usecases

Convert a PostgreSQL heap table into a DuckLake table

Load an external dataset

Query DuckDB-created DuckLake tables from PostgreSQL

Documentation

Roadmap

Features

Performance

Docs

Contributing

Acknowledgments

About

Uh oh!

Languages

License

relytcloud/pg_ducklake

Folders and files

Latest commit

History

Repository files navigation

pg_ducklake

Key Features

See it in action

Your first Data Lake in PostgreSQL

Access your data with DuckDB

Quick Start

Docker

Compile from source

Usecases

Convert a PostgreSQL heap table into a DuckLake table

Load an external dataset

Query DuckDB-created DuckLake tables from PostgreSQL

Documentation

Roadmap

Features

Performance

Docs

Contributing

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Languages