Skip to content

DuckLake support #2709

@adrianbr

Description

@adrianbr

Specs from rudolfix

(written by zilto, adapted from: #2709 (comment))

Tasks

Create a new ducklake destination; it should inherit a lot of features from duckdbdestination.

Configuration

To allow to configure the catalog (a database) and storage (a filesystem), DuckLakeCredentials should derive from duckdb configuration and have the following signature

@configspec(init=False)
class DuckLakeCredentials(DuckDbBaseCredentials):
    drivername: Final[str] = dataclasses.field(  # type: ignore
        default="ducklake", init=False, repr=False, compare=False
    )
    username: str
    password: TSecretStrValue = None
    database: str  # the name of the ducklake; required by DuckLakeSqlClient
    catalog: ConnectionStringCredentials  # for catalog; like postgres
    storage: FilesystemConfiguration  # for data;

Users will be able to configure the catalog and the storage from their config.toml and secrets.toml

Resolve catalog and storage secrets

The duckdb connection needs credentials to the catalog (postgres example) and the storage (supported storage)

  • for storage, the duckdb destination already has a feature to get secrets from FilesystemConfiguration (i.e., the mechanism that allows to query S3 with duckdb)
  • for catalog, we need to implement the function to get secrets from ConnectionStringCredentials

Configure DuckDB instance to support DuckLake

The class DuckDbBaseCredentials allows to set extensions, pragmas, global config, and local config. This allows to load the ducklake extension, but not to install it.

  • DuckLakeCredentials should inherit from DuckDbBaseCredentials, but enforce the ducklake extension to be installed
  • Set the current database to the ducklake name (details here).
  • A lot of filesystem are supported as duckdb extensions. httpfs supports all S3-compliant APIs. Also, it supports python fsspec. We can progressively add support here

Out-of-scope

  • ducklake table maintenance; this should be done by the user directly against their ducklake instance

Future work

  • Add ducklake to TTableFormat
  • DuckLakeClient should implement SupportsOpenTables to allow users to get authenticated catalog and table relation from the pipeline/destination ie. to do table maintenance. this is how delta and iceberg work.

Original issue

Feature description

we should support ducklake

User reported it currently doesn't work with our duckdb destination and we likely need to make some adjustments.

https://www.linkedin.com/feed/update/urn:li:activity:7335310253997096963?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7335310253997096963%2C7335375393492733952%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287335375393492733952%2Curn%3Ali%3Aactivity%3A7335310253997096963%29

Metadata

Metadata

Assignees

Labels

destinationIssue with a specific destination

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions