Skip to content

Commit 9d4fe15

Browse files
Add library user guide for extending SQL syntax (#19265)
## Which issue does this PR close? - Closes #19087. ## Rationale for this change As noted in the issue, the `RelationPlanner` API (added in #17843) along with `ExprPlanner` and `TypePlanner` allows DataFusion users to extend SQL in powerful ways. However, these extension points are not well documented and may be difficult for users to discover. ## What changes are included in this PR? Adds a new Library User Guide page (`extending-sql.md`) that documents how to extend DataFusion's SQL syntax: 1. **Introduction** explaining why SQL extensibility matters (different SQL dialects, custom operators) 2. **Architecture overview** showing how SQL flows through the planner extension points 3. **Extension points** with examples for each: - `ExprPlanner`: Custom expressions and operators (e.g., `->` for JSON) - `TypePlanner`: Custom SQL data types (e.g., `DATETIME` with precision) - `RelationPlanner`: Custom FROM clause elements (TABLESAMPLE, PIVOT/UNPIVOT) 4. **Implementation strategies** for RelationPlanner: - Rewrite to standard SQL (PIVOT/UNPIVOT example) - Custom logical and physical nodes (TABLESAMPLE example) 5. **Links to complete examples** in `datafusion-examples/examples/relation_planner/` Also adds a cross-reference from the existing `ExprPlanner` section in `adding-udfs.md` to the new comprehensive guide. ## Are these changes tested? This is documentation only. The code examples reference existing tested examples in `datafusion-examples/`. ## Are there any user-facing changes? Yes, new documentation is added to help users extend DataFusion's SQL syntax.
1 parent 7c05b20 commit 9d4fe15

File tree

3 files changed

+343
-1
lines changed

3 files changed

+343
-1
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ To get started, see
136136
library-user-guide/upgrading
137137
library-user-guide/extensions
138138
library-user-guide/using-the-sql-api
139+
library-user-guide/extending-sql
139140
library-user-guide/working-with-exprs
140141
library-user-guide/using-the-dataframe-api
141142
library-user-guide/building-logical-plans
Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Extending SQL Syntax
21+
22+
DataFusion provides a flexible extension system that allows you to customize SQL
23+
parsing and planning without modifying the core codebase. This is useful when you
24+
need to:
25+
26+
- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` for JSON)
27+
- Add custom data types not natively supported
28+
- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or `MATCH_RECOGNIZE`
29+
30+
## Architecture Overview
31+
32+
When DataFusion processes a SQL query, it goes through these stages:
33+
34+
```text
35+
┌─────────────┐ ┌─────────┐ ┌──────────────────────┐ ┌─────────────┐
36+
│ SQL String │───▶│ Parser │───▶│ SqlToRel │───▶│ LogicalPlan │
37+
└─────────────┘ └─────────┘ │ (SQL to LogicalPlan) │ └─────────────┘
38+
└──────────────────────┘
39+
40+
│ uses
41+
42+
┌───────────────────────┐
43+
│ Extension Planners │
44+
│ • ExprPlanner │
45+
│ • TypePlanner │
46+
│ • RelationPlanner │
47+
└───────────────────────┘
48+
```
49+
50+
The extension planners intercept specific parts of the SQL AST during the
51+
`SqlToRel` phase and allow you to customize how they are converted to DataFusion's
52+
logical plan.
53+
54+
## Extension Points
55+
56+
DataFusion provides three planner traits for extending SQL:
57+
58+
| Trait | Purpose | Registration Method |
59+
| ------------------- | --------------------------------------- | ------------------------------------------ |
60+
| [`ExprPlanner`] | Custom expressions and operators | `ctx.register_expr_planner()` |
61+
| [`TypePlanner`] | Custom SQL data types | `SessionStateBuilder::with_type_planner()` |
62+
| [`RelationPlanner`] | Custom FROM clause elements (relations) | `ctx.register_relation_planner()` |
63+
64+
**Planner Precedence**: Multiple [`ExprPlanner`]s and [`RelationPlanner`]s can be
65+
registered; they are invoked in reverse registration order (last registered wins).
66+
Return `Original(...)` to delegate to the next planner. Only one `TypePlanner`
67+
can be active at a time.
68+
69+
### ExprPlanner: Custom Expressions and Operators
70+
71+
Use [`ExprPlanner`] to customize how SQL expressions are converted to DataFusion
72+
logical expressions. This is useful for:
73+
74+
- Custom binary operators (e.g., `->`, `->>`, `@>`, `?`)
75+
- Custom field access patterns
76+
- Custom aggregate or window function handling
77+
78+
#### Available Methods
79+
80+
| Category | Methods |
81+
| ------------------ | ---------------------------------------------------------------------------------- |
82+
| Operators | `plan_binary_op`, `plan_any` |
83+
| Literals | `plan_array_literal`, `plan_dictionary_literal`, `plan_struct_literal` |
84+
| Functions | `plan_extract`, `plan_substring`, `plan_overlay`, `plan_position`, `plan_make_map` |
85+
| Identifiers | `plan_field_access`, `plan_compound_identifier` |
86+
| Aggregates/Windows | `plan_aggregate`, `plan_window` |
87+
88+
See the [ExprPlanner API documentation] for full method signatures.
89+
90+
#### Example: Custom Arrow Operator
91+
92+
This example maps the `->` operator to string concatenation:
93+
94+
```rust
95+
# use std::sync::Arc;
96+
# use datafusion::common::DFSchema;
97+
# use datafusion::error::Result;
98+
# use datafusion::logical_expr::Operator;
99+
# use datafusion::prelude::*;
100+
# use datafusion::sql::sqlparser::ast::BinaryOperator;
101+
use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};
102+
# use datafusion_expr::BinaryExpr;
103+
104+
#[derive(Debug)]
105+
struct MyCustomPlanner;
106+
107+
impl ExprPlanner for MyCustomPlanner {
108+
fn plan_binary_op(
109+
&self,
110+
expr: RawBinaryExpr,
111+
_schema: &DFSchema,
112+
) -> Result<PlannerResult<RawBinaryExpr>> {
113+
match &expr.op {
114+
// Map `->` to string concatenation
115+
BinaryOperator::Arrow => {
116+
Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
117+
left: Box::new(expr.left.clone()),
118+
right: Box::new(expr.right.clone()),
119+
op: Operator::StringConcat,
120+
})))
121+
}
122+
_ => Ok(PlannerResult::Original(expr)),
123+
}
124+
}
125+
}
126+
127+
#[tokio::main]
128+
async fn main() -> Result<()> {
129+
// Use postgres dialect to enable `->` operator parsing
130+
let config = SessionConfig::new()
131+
.set_str("datafusion.sql_parser.dialect", "postgres");
132+
let mut ctx = SessionContext::new_with_config(config);
133+
134+
// Register the custom planner
135+
ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;
136+
137+
// Now `->` works as string concatenation
138+
let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
139+
// Returns: "helloworld"
140+
Ok(())
141+
}
142+
```
143+
144+
For more details, see the [ExprPlanner API documentation] and the
145+
[expr_planner test examples].
146+
147+
### TypePlanner: Custom Data Types
148+
149+
Use [`TypePlanner`] to map SQL data types to Arrow/DataFusion types. This is useful
150+
when you need to support SQL types that aren't natively recognized.
151+
152+
#### Example: Custom DATETIME Type
153+
154+
```rust
155+
# use std::sync::Arc;
156+
# use arrow::datatypes::{DataType, TimeUnit};
157+
# use datafusion::error::Result;
158+
# use datafusion::prelude::*;
159+
# use datafusion::execution::SessionStateBuilder;
160+
use datafusion_expr::planner::TypePlanner;
161+
# use sqlparser::ast;
162+
163+
#[derive(Debug)]
164+
struct MyTypePlanner;
165+
166+
impl TypePlanner for MyTypePlanner {
167+
fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> {
168+
match sql_type {
169+
// Map DATETIME(precision) to Arrow Timestamp
170+
ast::DataType::Datetime(precision) => {
171+
let time_unit = match precision {
172+
Some(0) => TimeUnit::Second,
173+
Some(3) => TimeUnit::Millisecond,
174+
Some(6) => TimeUnit::Microsecond,
175+
None | Some(9) => TimeUnit::Nanosecond,
176+
_ => return Ok(None), // Let default handling take over
177+
};
178+
Ok(Some(DataType::Timestamp(time_unit, None)))
179+
}
180+
_ => Ok(None), // Return None for types we don't handle
181+
}
182+
}
183+
}
184+
185+
#[tokio::main]
186+
async fn main() -> Result<()> {
187+
let state = SessionStateBuilder::new()
188+
.with_default_features()
189+
.with_type_planner(Arc::new(MyTypePlanner))
190+
.build();
191+
192+
let ctx = SessionContext::new_with_state(state);
193+
194+
// Now DATETIME type is recognized
195+
ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
196+
Ok(())
197+
}
198+
```
199+
200+
For more details, see the [TypePlanner API documentation].
201+
202+
### RelationPlanner: Custom FROM Clause Elements
203+
204+
Use [`RelationPlanner`] to handle custom relations in the FROM clause. This
205+
enables you to implement SQL constructs like:
206+
207+
- `TABLESAMPLE` for sampling data
208+
- `PIVOT` / `UNPIVOT` for data reshaping
209+
- `MATCH_RECOGNIZE` for pattern matching
210+
- Any custom relation syntax parsed by sqlparser
211+
212+
#### The RelationPlannerContext
213+
214+
When implementing [`RelationPlanner`], you receive a [`RelationPlannerContext`] that
215+
provides utilities for planning:
216+
217+
| Method | Purpose |
218+
| --------------------------- | ----------------------------------------------- |
219+
| `plan(relation)` | Recursively plan a nested relation |
220+
| `sql_to_expr(expr, schema)` | Convert SQL expression to DataFusion Expr |
221+
| `context_provider()` | Access session configuration, tables, functions |
222+
223+
See the [RelationPlanner API documentation] for additional methods like
224+
`normalize_ident()` and `object_name_to_table_reference()`.
225+
226+
#### Implementation Strategies
227+
228+
There are two main approaches when implementing a [`RelationPlanner`]:
229+
230+
1. **Rewrite to Standard SQL**: Transform custom syntax into equivalent standard
231+
operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY
232+
with CASE expressions). This is the simplest approach when possible.
233+
234+
2. **Custom Logical and Physical Nodes**: Create a [`UserDefinedLogicalNode`] to
235+
represent the operation in the logical plan, along with a custom [`ExecutionPlan`]
236+
to execute it. Both are required for end-to-end execution.
237+
238+
#### Example: Basic RelationPlanner Structure
239+
240+
```rust
241+
# use std::sync::Arc;
242+
# use datafusion::error::Result;
243+
# use datafusion::prelude::*;
244+
use datafusion_expr::planner::{
245+
PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
246+
};
247+
use datafusion_sql::sqlparser::ast::TableFactor;
248+
249+
#[derive(Debug)]
250+
struct MyRelationPlanner;
251+
252+
impl RelationPlanner for MyRelationPlanner {
253+
fn plan_relation(
254+
&self,
255+
relation: TableFactor,
256+
ctx: &mut dyn RelationPlannerContext,
257+
) -> Result<RelationPlanning> {
258+
match relation {
259+
// Handle your custom relation
260+
TableFactor::Pivot { table, alias, .. } => {
261+
// Plan the input table
262+
let input = ctx.plan(*table)?;
263+
264+
// Transform or wrap the plan as needed
265+
// ...
266+
267+
Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias)))
268+
}
269+
270+
// Return Original for relations you don't handle
271+
other => Ok(RelationPlanning::Original(other)),
272+
}
273+
}
274+
}
275+
276+
#[tokio::main]
277+
async fn main() -> Result<()> {
278+
let ctx = SessionContext::new();
279+
280+
// Register the custom planner
281+
ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;
282+
283+
Ok(())
284+
}
285+
```
286+
287+
## Complete Examples
288+
289+
The DataFusion repository includes comprehensive examples demonstrating each
290+
approach:
291+
292+
### TABLESAMPLE (Custom Logical and Physical Nodes)
293+
294+
The [table_sample.rs] example shows a complete end-to-end implementation of how to
295+
support queries such as:
296+
297+
```sql
298+
SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
299+
```
300+
301+
### PIVOT/UNPIVOT (Rewrite Strategy)
302+
303+
The [pivot_unpivot.rs] example demonstrates rewriting custom syntax to standard SQL
304+
for queries such as:
305+
306+
```sql
307+
SELECT * FROM sales
308+
PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
309+
```
310+
311+
## Recap
312+
313+
1. Use [`ExprPlanner`] for custom operators and expression handling
314+
2. Use [`TypePlanner` for custom SQL data types
315+
3. Use [`RelationPlanner`] for custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.)
316+
4. Register planners via [`SessionContext`] or [`SessionStateBuilder`]
317+
318+
## See Also
319+
320+
- API Documentation: [`ExprPlanner`], [`TypePlanner`], [`RelationPlanner`]
321+
- [relation_planner examples] - Complete TABLESAMPLE, PIVOT/UNPIVOT implementations
322+
- [expr_planner test examples] - Custom operator examples
323+
- [Custom Expression Planning](functions/adding-udfs.md#custom-expression-planning) in the UDF guide
324+
325+
[`exprplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html
326+
[`typeplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.TypePlanner.html
327+
[`relationplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.RelationPlanner.html
328+
[`userdefinedlogicalnode`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.UserDefinedLogicalNode.html
329+
[`executionplan`]: https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
330+
[`sessioncontext`]: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html
331+
[`sessionstatebuilder`]: https://docs.rs/datafusion/latest/datafusion/execution/session_state/struct.SessionStateBuilder.html
332+
[`relationplannercontext`]: https://docs.rs/datafusion/latest/datafusion/sql/planner/trait.RelationPlannerContext.html
333+
[exprplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html
334+
[typeplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.TypePlanner.html
335+
[relationplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.RelationPlanner.html
336+
[expr_planner test examples]: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/expr_planner.rs
337+
[relation_planner examples]: https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/relation_planner
338+
[table_sample.rs]: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/relation_planner/table_sample.rs
339+
[pivot_unpivot.rs]: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/relation_planner/pivot_unpivot.rs

docs/source/library-user-guide/functions/adding-udfs.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1492,7 +1492,9 @@ async fn main() -> Result<()> {
14921492

14931493
## Custom Expression Planning
14941494

1495-
DataFusion provides native support for common SQL operators by default such as `+`, `-`, `||`. However it does not provide support for other operators such as `@>`. To override DataFusion's default handling or support unsupported operators, developers can extend DataFusion by implementing custom expression planning, a core feature of DataFusion
1495+
DataFusion provides native support for common SQL operators and constructs by default such as `+`, `-`, `||`. However it does not provide support for other operators such as `@>` or constructs like `TABLESAMPLE` which are less common or vary more between SQL dialects. To override DataFusion's default handling or support these unsupported features, developers can extend DataFusion by implementing custom expression planning, a core feature of DataFusion.
1496+
1497+
For a comprehensive guide on extending SQL syntax including `ExprPlanner`, `TypePlanner`, and `RelationPlanner`, see [Extending DataFusion's SQL Syntax](../extending-sql.md)
14961498

14971499
### Implementing Custom Expression Planning
14981500

0 commit comments

Comments
 (0)