This file provides guidance to AI coding assistants when working with code in this repository.
pg_ducklake is a PostgreSQL extension that extends pg_duckdb to support DuckLake, an open lakehouse format. This extension static-links ducklake, the official DuckDB extension, and loads it
via DuckDB's LoadStaticExtension<T>().
Its C++ code should use namespace pgducklake.
- header files in
include/ - implement files in
src/src/pgducklake.cpp:_PG_init, extension bootstrapsrc/pgducklake_duckdb.cpp: DuckDB bridge, static extension loadsrc/pgducklake_metadata_manager.cpp: custom DuckLake metadata manager
- regression tests in
test/regression/ - isolation tests in
test/isolation/
PG_CONFIG is required. Usually a local pg is installed under workdir, e.g. PG_CONFIG=$(pwd)/pg-17/bin/pg_config, to avoid conflicts with other worktrees. If neither local pg nor global pg is found, stop and ask user.
git submodule update --init --recursive
PG_CONFIG=<pg_config> make install
PG_CONFIG=<pg_config> make installcheck
# Run single test
PG_CONFIG=<pg_config> make installcheck TEST=basicUse regression and isolation tests to verify functionality as possible.
- Write clean, minimal code; fewer lines is better
- Prioritize simplicity for effective and maintainable software
- Only include comments that are essential to understanding functionality or convey non-obvious information
- ASCII only in all source files, SQL tests, and expected output — no emojis, no Unicode dashes/quotes (use
-,--,',")
- Avoid using
extern "C"to reference symbols from the same library. Instead, place it at the header file. - Use
extern "C"only when necessary, such as when interfacing with third-party libraries. - Use
namespace pgducklakefor C++ extension code (do not usenamespace pg_ducklake). - Use
pgducklake::when qualifying symbols outside the namespace block. - Use C++ raw string literals (
R"(…)") for multiline SQL; never use adjacent-string concatenation for SQL queries.
PostgreSQL and DuckDB headers are conflict-prone. Follow strict include order in mixed files:
- DuckDB headers
- DuckLake headers
- Local
pgducklakeheaders - PostgreSQL headers last, inside
extern "C", must include<postgres.h>at first.
FATAL macro conflict: PostgreSQL's elog.h defines #define FATAL 22, which clobbers DuckDB's ExceptionType::FATAL enum member in duckdb/common/exception.hpp. Any header that transitively includes both will break. The fix is include order: DuckDB's exception.hpp (or any header that pulls it in, e.g., string_util.hpp, error_data.hpp) must be parsed before postgres.h defines the macro. Once parsed, C++ include guards prevent re-inclusion. Watch for indirect includes -- pgduckdb/pgduckdb_contracts.hpp and pgducklake/utility/cpp_wrapper.hpp both include postgres.h, so any DuckDB header they transitively need must already be included earlier in the translation unit.
Treat third_party/pg_duckdb as upstream:
- Prefer additive hooks, avoid invasive edits.
- Keep diffs minimal and upstream-friendly.
- Ensure zero behavior change when hooks are unused.
- Never change the linkage or signature of upstream functions (e.g., do not remove
extern "C"from functions that already exist induckdb/main). Only our own additions may use C++ linkage. - Call pg_duckdb hooks via
pgduckdb::frompgduckdb/pgduckdb_contracts.hpp(our contract header). Upstream C-linkage functions (e.g.,RegisterDuckdbTableAm) keepextern "C"and are called unqualified. - Our exported C++ symbols in pg_duckdb must be in
namespace pgduckdbto avoid name conflicts. Declare them ininclude/pgduckdb/pgduckdb_contracts.hppundernamespace pgduckdb.
Documentation follows two axes: AI-oriented and human-oriented.
All docs must be reachable from one of two entrypoints:
CLAUDE.md
+-- .claude/skills/* AI workflow guidance
+-- src/*.cpp header comments per-file purpose and usage
+-- test/regression/ self-documenting test cases
+-- test/isolation/ concurrency test cases
To avoid Docs Rot, keep AI docs near the code. Do NOT write separate explanation docs or duplicate what code already says. Maintain header comments after each edit. Inline comments only when logic is non-obvious.
README.md
+-- docs/README.md index of all human docs
+-- docs/sql_objects.md all SQL objects, functions, and procedures
+-- docs/settings.md GUCs and DuckLake options
+-- docs/access_control.md
+-- docs/compilation.md
Every new doc file must be linked from docs/README.md. Keep synced with code:
- When adding, removing, or changing a
ducklake.*SQL function or procedure inpg_ducklake--0.1.0.sql, updatedocs/sql_objects.md. - In reference docs, order TOC tables alphabetically; keep detailed descriptions in logical order.
- When modifying multiple files, run file modification tasks in parallel whenever possible, instead of processing them sequentially
- Never
cdinto subdirectories in Bash commands — it changes the working directory for subsequent calls. Use subshells ((cd third_party/pg_duckdb && git ...)) orpushd/popd(pushd third_party/pg_duckdb; git ...; popd) to keep the working directory at the project root.