Skip to content

Commit dc45f60

Browse files
alambcomphead
andauthored
Improve the DML / DDL Documentation (#16115)
* Update documentation about DDL and DML * Improve the DML Documentation * Apply suggestions from code review Co-authored-by: Oleks V <[email protected]> * Fix docs * Fix docs --------- Co-authored-by: Oleks V <[email protected]>
1 parent ce3e387 commit dc45f60

File tree

5 files changed

+45
-10
lines changed

5 files changed

+45
-10
lines changed

benchmarks/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ benchmarks that compare performance with other engines. For example:
3333
- [ClickBench] scripts are in the [ClickBench repo](https://github.com/ClickHouse/ClickBench/tree/main/datafusion)
3434
- [H2o.ai `db-benchmark`] scripts are in [db-benchmark](https://github.com/apache/datafusion/tree/main/benchmarks/src/h2o.rs)
3535

36-
[ClickBench]: https://github.com/ClickHouse/ClickBench/tree/main
37-
[H2o.ai `db-benchmark`]: https://github.com/h2oai/db-benchmark
36+
[clickbench]: https://github.com/ClickHouse/ClickBench/tree/main
37+
[h2o.ai `db-benchmark`]: https://github.com/h2oai/db-benchmark
3838

3939
# Running the benchmarks
4040

benchmarks/queries/clickbench/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ClickBench is focused on aggregation and filtering performance (though it has no
99
- `queries.sql` - Actual ClickBench queries, downloaded from the [ClickBench repository]
1010
- `extended.sql` - "Extended" DataFusion specific queries.
1111

12-
[ClickBench repository]: https://github.com/ClickHouse/ClickBench/blob/main/datafusion/queries.sql
12+
[clickbench repository]: https://github.com/ClickHouse/ClickBench/blob/main/datafusion/queries.sql
1313

1414
## "Extended" Queries
1515

datafusion/expr/src/logical_plan/dml.rs

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,28 @@ impl Hash for CopyTo {
8989
}
9090
}
9191

92-
/// The operator that modifies the content of a database (adapted from
93-
/// substrait WriteRel)
92+
/// Modifies the content of a database
93+
///
94+
/// This operator is used to perform DML operations such as INSERT, DELETE,
95+
/// UPDATE, and CTAS (CREATE TABLE AS SELECT).
96+
///
97+
/// * `INSERT` - Appends new rows to the existing table. Calls
98+
/// [`TableProvider::insert_into`]
99+
///
100+
/// * `DELETE` - Removes rows from the table. Currently NOT supported by the
101+
/// [`TableProvider`] trait or builtin sources.
102+
///
103+
/// * `UPDATE` - Modifies existing rows in the table. Currently NOT supported by
104+
/// the [`TableProvider`] trait or builtin sources.
105+
///
106+
/// * `CREATE TABLE AS SELECT` - Creates a new table and populates it with data
107+
/// from a query. This is similar to the `INSERT` operation, but it creates a new
108+
/// table instead of modifying an existing one.
109+
///
110+
/// Note that the structure is adapted from substrait WriteRel)
111+
///
112+
/// [`TableProvider`]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html
113+
/// [`TableProvider::insert_into`]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#method.insert_into
94114
#[derive(Clone)]
95115
pub struct DmlStatement {
96116
/// The table name
@@ -177,11 +197,18 @@ impl PartialOrd for DmlStatement {
177197
}
178198
}
179199

200+
/// The type of DML operation to perform.
201+
///
202+
/// See [`DmlStatement`] for more details.
180203
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash)]
181204
pub enum WriteOp {
205+
/// `INSERT INTO` operation
182206
Insert(InsertOp),
207+
/// `DELETE` operation
183208
Delete,
209+
/// `UPDATE` operation
184210
Update,
211+
/// `CREATE TABLE AS SELECT` operation
185212
Ctas,
186213
}
187214

docs/source/library-user-guide/catalogs.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,14 @@ This section describes how to create and manage catalogs, schemas, and tables in
2323

2424
## General Concepts
2525

26-
CatalogProviderList, Catalogs, schemas, and tables are organized in a hierarchy. A CatalogProviderList contains catalog providers, a catalog provider contains schemas and a schema contains tables.
26+
Catalog providers, catalogs, schemas, and tables are organized in a hierarchy. A `CatalogProviderList` contains `CatalogProvider`s, a `CatalogProvider` contains `SchemaProviders` and a `SchemaProvider` contains `TableProvider`s.
2727

2828
DataFusion comes with a basic in memory catalog functionality in the [`catalog` module]. You can use these in memory implementations as is, or extend DataFusion with your own catalog implementations, for example based on local files or files on remote object storage.
2929

30+
DataFusion supports DDL queries (e.g. `CREATE TABLE`) using the catalog API described in this section. See the [TableProvider] section for information on DML queries (e.g. `INSERT INTO`).
31+
3032
[`catalog` module]: https://docs.rs/datafusion/latest/datafusion/catalog/index.html
33+
[tableprovider]: ./custom-table-providers.md
3134

3235
Similarly to other concepts in DataFusion, you'll implement various traits to create your own catalogs, schemas, and tables. The following sections describe the traits you'll need to implement.
3336

docs/source/library-user-guide/custom-table-providers.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,22 @@
1919

2020
# Custom Table Provider
2121

22-
Like other areas of DataFusion, you extend DataFusion's functionality by implementing a trait. The `TableProvider` and associated traits, have methods that allow you to implement a custom table provider, i.e. use DataFusion's other functionality with your custom data source.
22+
Like other areas of DataFusion, you extend DataFusion's functionality by implementing a trait. The [`TableProvider`] and associated traits allow you to implement a custom table provider, i.e. use DataFusion's other functionality with your custom data source.
2323

24-
This section will also touch on how to have DataFusion use the new `TableProvider` implementation.
24+
This section describes how to create a [`TableProvider`] and how to configure DataFusion to use it for reading.
2525

2626
## Table Provider and Scan
2727

28-
The `scan` method on the `TableProvider` is likely its most important. It returns an `ExecutionPlan` that DataFusion will use to read the actual data during execution of the query.
28+
The [`TableProvider::scan`] method reads data from the table and is likely the most important. It returns an [`ExecutionPlan`] that DataFusion will use to read the actual data during execution of the query. The [`TableProvider::insert_into`] method is used to `INSERT` data into the table.
2929

3030
### Scan
3131

32-
As mentioned, `scan` returns an execution plan, and in particular a `Result<Arc<dyn ExecutionPlan>>`. The core of this is returning something that can be dynamically dispatched to an `ExecutionPlan`. And as per the general DataFusion idea, we'll need to implement it.
32+
As mentioned, [`TableProvider::scan`] returns an execution plan, and in particular a `Result<Arc<dyn ExecutionPlan>>`. The core of this is returning something that can be dynamically dispatched to an `ExecutionPlan`. And as per the general DataFusion idea, we'll need to implement it.
33+
34+
[`tableprovider`]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html
35+
[`tableprovider::scan`]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#tymethod.scan
36+
[`tableprovider::insert_into`]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#tymethod.insert_into
37+
[`executionplan`]: https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
3338

3439
#### Execution Plan
3540

0 commit comments

Comments
 (0)