diff --git a/docs/stable/duckdb/advanced_features/partitioning.md b/docs/stable/duckdb/advanced_features/partitioning.md index 2d256eb..8eab9a0 100644 --- a/docs/stable/duckdb/advanced_features/partitioning.md +++ b/docs/stable/duckdb/advanced_features/partitioning.md @@ -16,7 +16,8 @@ These keys do not need to be necessarily stored within the files, or in the path ## Examples -> By default, DuckLake supports [Hive partitioning](https://duckdb.org/docs/stable/data/partitioning/hive_partitioning). If you want to avoid this style of partitions, you can opt out via using `CALL my_ducklake.set_option('hive_file_pattern', false)` +> By default, DuckLake uses [Hive partitioning](https://duckdb.org/docs/stable/data/partitioning/hive_partitioning). +> If you want to avoid this style of partitions, you can opt out via using `CALL my_ducklake.set_option('hive_file_pattern', false)`. Set the partitioning keys of a table, such that new data added to the table is partitioned by these keys. diff --git a/docs/stable/specification/data_types.md b/docs/stable/specification/data_types.md index e8c3ac7..1104f7f 100644 --- a/docs/stable/specification/data_types.md +++ b/docs/stable/specification/data_types.md @@ -51,7 +51,7 @@ The following nested types are supported: ## Geometry Types -DuckLake supports geometry types via the [`spatial` extension](https://duckdb.org/docs/stable/core_extensions/spatial/overview#the-geometry-type) and the Parquet `geometry` type. The `geometry` type can store different types of spatial representations called geometry primitives, of which DuckLake supports the following: +DuckLake supports geometry types using the `geometry` type of the Parquet format. The `geometry` type can store different types of spatial representations called geometry primitives, of which DuckLake supports the following: | Geometry primitive | Description | | -------------------- | ----------------------------------------------------------------------------------------------- | diff --git a/docs/stable/specification/queries.md b/docs/stable/specification/queries.md index 5d21f16..efc425d 100644 --- a/docs/stable/specification/queries.md +++ b/docs/stable/specification/queries.md @@ -118,7 +118,7 @@ Not all files have to contain all the columns currently defined in the table, so #### Note on Paths -In DuckLake, paths can be relative to the initially specified data path. Whether path is relative or not is stored in the [`ducklake_data_file`]({% link docs/stable/specification/tables/ducklake_data_file.md %}) and [`ducklake_delete_file`]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) entries (`path_is_relative`) to the `data_path` prefix from [`ducklake_metadata`]({% link docs/stable/specification/tables/ducklake_metadata.md %}). +In DuckLake, paths can be relative to the initially specified data path. Whether a path is relative or not to the `data_path` prefix from [`ducklake_metadata`]({% link docs/stable/specification/tables/ducklake_metadata.md %}), is stored in the [`ducklake_data_file`]({% link docs/stable/specification/tables/ducklake_data_file.md %}) and [`ducklake_delete_file`]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) entries (`path_is_relative`). ### `SELECT` with File Pruning @@ -504,10 +504,14 @@ where - `⟨FILE_SIZE_BYTES⟩`{:.language-sql .highlight} is the file size. - `⟨FOOTER_SIZE⟩`{:.language-sql .highlight} is the position of the Parquet footer. This helps with efficiently reading the file. -> We have omitted some complexity around relative paths and encrypted files in this example. Refer to the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) documentation for details. +Notes: -> Please note that `DELETE` operations also do not require updates to table statistics, as the statistics are maintained as upper bounds, and deletions do not violate these bounds. +* We have omitted some complexity around relative paths and encrypted files in this example. Refer to the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}) documentation for details. + +* In DuckLake, the strategy used for `DELETE` operations is **merge-on-read**. Delete files are referenced in the [`ducklake_delete_file` table]({% link docs/stable/specification/tables/ducklake_delete_file.md %}). + +* Please note that `DELETE` operations also do not require updates to table statistics, as the statistics are maintained as upper bounds, and deletions do not violate these bounds. ### `UPDATE` -In DuckLake, `UPDATE` operations are internally implemented as a combination of a `DELETE` followed by an `INSERT`. Specifically, the outdated row is marked for deletion, and the updated version of that row is inserted. As a result, the changes to the metadata tables are equivalent to performing a `DELETE` and an `INSERT` operation sequentially within the same transaction. +In DuckLake, `UPDATE` operations are expressed as a combination of a `DELETE` followed by an `INSERT`. Specifically, the outdated row is marked for deletion, and the updated version of that row is inserted. As a result, the changes to the metadata tables are equivalent to performing a `DELETE` and an `INSERT` operation sequentially within the same transaction. diff --git a/docs/stable/specification/tables/ducklake_column.md b/docs/stable/specification/tables/ducklake_column.md index c43237f..499d382 100644 --- a/docs/stable/specification/tables/ducklake_column.md +++ b/docs/stable/specification/tables/ducklake_column.md @@ -23,7 +23,7 @@ This table describes the columns that are part of a table, including their types - `begin_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). This version of the column exists *starting with* this snapshot id. - `end_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). This version of the column exists *up to but not including* this snapshot id. If `end_snapshot` is `NULL`, this version of the column is currently valid. - `table_id` refers to a `table_id` from the [`ducklake_table` table]({% link docs/stable/specification/tables/ducklake_table.md %}). -- `column_order` is a number that defines the position of the column in the list of columns. It needs to be unique within a snapshot but does not have to be strictly monotonic (gaps are ok). +- `column_order` is a number that defines the position of the column in the list of columns. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok). - `column_name` is the name of this version of the column, e.g., `my_column`. - `column_type` is the type of this version of the column as defined in the list of [data types]({% link docs/stable/specification/data_types.md %}). - `initial_default` is the *initial* default value as the column is being created, e.g., in `ALTER TABLE`, encoded as a string. Can be `NULL`. diff --git a/docs/stable/specification/tables/ducklake_data_file.md b/docs/stable/specification/tables/ducklake_data_file.md index 811b2e4..b941c6a 100644 --- a/docs/stable/specification/tables/ducklake_data_file.md +++ b/docs/stable/specification/tables/ducklake_data_file.md @@ -28,7 +28,7 @@ Data files contain the actual row data. - `table_id` refers to a `table_id` from the [`ducklake_table` table]({% link docs/stable/specification/tables/ducklake_table.md %}). - `begin_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). The file is part of the table *starting with* this snapshot id. - `end_snapshot` refers to a `snapshot_id` from the [`ducklake_snapshot` table]({% link docs/stable/specification/tables/ducklake_snapshot.md %}). The file is part of the table *up to but not including* this snapshot id. If `end_snapshot` is `NULL`, the file is currently part of the table. -- `file_order` is a number that defines the vertical position of the file in the table. it needs to be unique within a snapshot but does not have to be strictly monotonic (gaps are ok). +- `file_order` is a number that defines the vertical position of the file in the table. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok). - `path` is the file path of the data file, e.g., `my_file.parquet` for a relative path. - `path_is_relative` whether the `path` is relative to the [`path`]({% link docs/stable/specification/tables/ducklake_table.md %}) of the table (true) or an absolute path (false). - `file_format` is the storage format of the file. Currently, only `parquet` is allowed.