Add a note about executemany or executebatch inserts into DuckLake

DuckLake behaviour with client's "batch insert" methods, when a separate Parquet file is written for each incoming record, may be confusing. There are at least 2 reports on this from users ([1](https://github.com/duckdb/ducklake/discussions/491), [2](https://discord.com/channels/909674491309850675/1474188128515522662)).

It is suggested to add a note like the following somewhere to the docs:

Bulk data import

When input data is already stored in Parquet files with the same schema as the destination DuckLake table, then it is advised to use [ducklake_add_data_files](TODO/metadata/adding_files) to import this data.

For input data from other formats/sources the most performant way to import it to DuckLake is to first insert the records into the in-memory "staging" table (for example, using the [Appender interface](TODO)) and then perform the bulk insert into the DuckLake table:

```sql
INSERT INTO ducklake.tab1 FROM memory.tab1
```

Warning:

"batch insert" functionality in DuckDB clients can be misleading when used with DuckLake. For example, when Python DB-API method [executemany](https://peps.python.org/pep-0249/#executemany) or JDBC method [executeBatch](https://docs.oracle.com/en/java/javase/25/docs/api/java.sql/java/sql/Statement.html#executeBatch()) are used with DuckLake, the specified `INSERT` query will be prepared and then executed in a loop for every parameter tuple in the batch, resulting in a sequence like this:

```sql
PREPARE p AS INSERT INTO tab1 (col1, col2) VALUES (?, ?);
BEGIN TRANSACTION;
EXECUTE p(1, 'foo');
EXECUTE p(2, 'bar');
EXECUTE p(3, 'baz');
...
EXECUTE p(n, 'boo');
COMMIT;
```

From the DuckLake point of view incoming `INSERT` queries are separate and not a part of any "batch", so a **separate Parquet file will be created for every record in the batch**. That is unlikely desired, so the usage of `executemany()`/`executeBatch()` directly with DuckLake is discouraged. These methods still can be used to insert the records into the in-memory "staging" table before copying it to a DuckLake table.

TODO: Polars data frame direct copy example.

@guillesd I wonder if you have a minute to look at this note - how to structure it better and in what section it is appropriate to put it into.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a note about executemany or executebatch inserts into DuckLake #290

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a note about executemany or executebatch inserts into DuckLake #290

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions