Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/stable/duckdb/advanced_features/partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ These keys do not need to be necessarily stored within the files, or in the path

## Examples

> By default, DuckLake supports [Hive partitioning](https://duckdb.org/docs/stable/data/partitioning/hive_partitioning). If you want to avoid this style of partitions, you can opt out via using `CALL my_ducklake.set_option('hive_file_pattern', false)`
> By default, DuckLake uses [Hive partitioning](https://duckdb.org/docs/stable/data/partitioning/hive_partitioning).
> If you want to avoid this style of partitions, you can opt out via using `CALL my_ducklake.set_option('hive_file_pattern', false)`.

Set the partitioning keys of a table, such that new data added to the table is partitioned by these keys.

Expand Down
2 changes: 1 addition & 1 deletion docs/stable/specification/data_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The following nested types are supported:

## Geometry Types

DuckLake supports geometry types via the [`spatial` extension](https://duckdb.org/docs/stable/core_extensions/spatial/overview#the-geometry-type) and the Parquet `geometry` type. The `geometry` type can store different types of spatial representations called geometry primitives, of which DuckLake supports the following:
DuckLake supports geometry types using the `geometry` type of the Parquet format. The `geometry` type can store different types of spatial representations called geometry primitives, of which DuckLake supports the following:

| Geometry primitive | Description |
| -------------------- | ----------------------------------------------------------------------------------------------- |
Expand Down
12 changes: 8 additions & 4 deletions docs/stable/specification/queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@

#### Note on Paths

In DuckLake, paths can be relative to the initially specified data path. Whether path is relative or not is stored in the [`ducklake_data_file`]({% link docs/stable/specification/tables/ducklake_data_file.md %}) and [`ducklake_delete_file`]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) entries (`path_is_relative`) to the `data_path` prefix from [`ducklake_metadata`]({% link docs/stable/specification/tables/ducklake_metadata.md %}).
In DuckLake, paths can be relative to the initially specified data path. Whether a path is relative or not to the `data_path` prefix from [`ducklake_metadata`]({% link docs/stable/specification/tables/ducklake_metadata.md %}), is stored in the [`ducklake_data_file`]({% link docs/stable/specification/tables/ducklake_data_file.md %}) and [`ducklake_delete_file`]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) entries (`path_is_relative`).

### `SELECT` with File Pruning

Expand Down Expand Up @@ -504,10 +504,14 @@
- `⟨FILE_SIZE_BYTES⟩`{:.language-sql .highlight} is the file size.
- `⟨FOOTER_SIZE⟩`{:.language-sql .highlight} is the position of the Parquet footer. This helps with efficiently reading the file.

> We have omitted some complexity around relative paths and encrypted files in this example. Refer to the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) documentation for details.
Notes:

> Please note that `DELETE` operations also do not require updates to table statistics, as the statistics are maintained as upper bounds, and deletions do not violate these bounds.
* We have omitted some complexity around relative paths and encrypted files in this example. Refer to the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) documentation for details.

Check failure on line 509 in docs/stable/specification/queries.md

View workflow job for this annotation

GitHub Actions / markdown

Unordered list style [Expected: dash; Actual: asterisk]

* In DuckLake, the strategy used for `DELETE` operations is **merge-on-read**. Delete files are referenced in the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}).

Check failure on line 511 in docs/stable/specification/queries.md

View workflow job for this annotation

GitHub Actions / markdown

Unordered list style [Expected: dash; Actual: asterisk]

* Please note that `DELETE` operations also do not require updates to table statistics, as the statistics are maintained as upper bounds, and deletions do not violate these bounds.

Check failure on line 513 in docs/stable/specification/queries.md

View workflow job for this annotation

GitHub Actions / markdown

Unordered list style [Expected: dash; Actual: asterisk]

### `UPDATE`

In DuckLake, `UPDATE` operations are internally implemented as a combination of a `DELETE` followed by an `INSERT`. Specifically, the outdated row is marked for deletion, and the updated version of that row is inserted. As a result, the changes to the metadata tables are equivalent to performing a `DELETE` and an `INSERT` operation sequentially within the same transaction.
In DuckLake, `UPDATE` operations are expressed as a combination of a `DELETE` followed by an `INSERT`. Specifically, the outdated row is marked for deletion, and the updated version of that row is inserted. As a result, the changes to the metadata tables are equivalent to performing a `DELETE` and an `INSERT` operation sequentially within the same transaction.
2 changes: 1 addition & 1 deletion docs/stable/specification/tables/ducklake_column.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ This table describes the columns that are part of a table, including their types
- `begin_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). This version of the column exists *starting with* this snapshot id.
- `end_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). This version of the column exists *up to but not including* this snapshot id. If `end_snapshot` is `NULL`, this version of the column is currently valid.
- `table_id` refers to a `table_id` from the [`ducklake_table` table]({% link docs/stable/specification/tables/ducklake_table.md %}).
- `column_order` is a number that defines the position of the column in the list of columns. It needs to be unique within a snapshot but does not have to be strictly monotonic (gaps are ok).
- `column_order` is a number that defines the position of the column in the list of columns. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok).
- `column_name` is the name of this version of the column, e.g., `my_column`.
- `column_type` is the type of this version of the column as defined in the list of [data types]({% link docs/stable/specification/data_types.md %}).
- `initial_default` is the *initial* default value as the column is being created, e.g., in `ALTER TABLE`, encoded as a string. Can be `NULL`.
Expand Down
2 changes: 1 addition & 1 deletion docs/stable/specification/tables/ducklake_data_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Data files contain the actual row data.
- `table_id` refers to a `table_id` from the [`ducklake_table` table]({% link docs/stable/specification/tables/ducklake_table.md %}).
- `begin_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). The file is part of the table *starting with* this snapshot id.
- `end_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). The file is part of the table *up to but not including* this snapshot id. If `end_snapshot` is `NULL`, the file is currently part of the table.
- `file_order` is a number that defines the vertical position of the file in the table. it needs to be unique within a snapshot but does not have to be strictly monotonic (gaps are ok).
- `file_order` is a number that defines the vertical position of the file in the table. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok).
- `path` is the file path of the data file, e.g., `my_file.parquet` for a relative path.
- `path_is_relative` whether the `path` is relative to the [`path`]({% link docs/stable/specification/tables/ducklake_table.md %}) of the table (true) or an absolute path (false).
- `file_format` is the storage format of the file. Currently, only `parquet` is allowed.
Expand Down