|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Extending SQL Syntax |
| 21 | + |
| 22 | +DataFusion provides a flexible extension system that allows you to customize SQL |
| 23 | +parsing and planning without modifying the core codebase. This is useful when you |
| 24 | +need to: |
| 25 | + |
| 26 | +- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` for JSON) |
| 27 | +- Add custom data types not natively supported |
| 28 | +- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or `MATCH_RECOGNIZE` |
| 29 | + |
| 30 | +## Architecture Overview |
| 31 | + |
| 32 | +When DataFusion processes a SQL query, it goes through these stages: |
| 33 | + |
| 34 | +```text |
| 35 | +┌─────────────┐ ┌─────────┐ ┌──────────────────────┐ ┌─────────────┐ |
| 36 | +│ SQL String │───▶│ Parser │───▶│ SqlToRel │───▶│ LogicalPlan │ |
| 37 | +└─────────────┘ └─────────┘ │ (SQL to LogicalPlan) │ └─────────────┘ |
| 38 | + └──────────────────────┘ |
| 39 | + │ |
| 40 | + │ uses |
| 41 | + ▼ |
| 42 | + ┌───────────────────────┐ |
| 43 | + │ Extension Planners │ |
| 44 | + │ • ExprPlanner │ |
| 45 | + │ • TypePlanner │ |
| 46 | + │ • RelationPlanner │ |
| 47 | + └───────────────────────┘ |
| 48 | +``` |
| 49 | + |
| 50 | +The extension planners intercept specific parts of the SQL AST during the |
| 51 | +`SqlToRel` phase and allow you to customize how they are converted to DataFusion's |
| 52 | +logical plan. |
| 53 | + |
| 54 | +## Extension Points |
| 55 | + |
| 56 | +DataFusion provides three planner traits for extending SQL: |
| 57 | + |
| 58 | +| Trait | Purpose | Registration Method | |
| 59 | +| ------------------- | --------------------------------------- | ------------------------------------------ | |
| 60 | +| [`ExprPlanner`] | Custom expressions and operators | `ctx.register_expr_planner()` | |
| 61 | +| [`TypePlanner`] | Custom SQL data types | `SessionStateBuilder::with_type_planner()` | |
| 62 | +| [`RelationPlanner`] | Custom FROM clause elements (relations) | `ctx.register_relation_planner()` | |
| 63 | + |
| 64 | +**Planner Precedence**: Multiple [`ExprPlanner`]s and [`RelationPlanner`]s can be |
| 65 | +registered; they are invoked in reverse registration order (last registered wins). |
| 66 | +Return `Original(...)` to delegate to the next planner. Only one `TypePlanner` |
| 67 | +can be active at a time. |
| 68 | + |
| 69 | +### ExprPlanner: Custom Expressions and Operators |
| 70 | + |
| 71 | +Use [`ExprPlanner`] to customize how SQL expressions are converted to DataFusion |
| 72 | +logical expressions. This is useful for: |
| 73 | + |
| 74 | +- Custom binary operators (e.g., `->`, `->>`, `@>`, `?`) |
| 75 | +- Custom field access patterns |
| 76 | +- Custom aggregate or window function handling |
| 77 | + |
| 78 | +#### Available Methods |
| 79 | + |
| 80 | +| Category | Methods | |
| 81 | +| ------------------ | ---------------------------------------------------------------------------------- | |
| 82 | +| Operators | `plan_binary_op`, `plan_any` | |
| 83 | +| Literals | `plan_array_literal`, `plan_dictionary_literal`, `plan_struct_literal` | |
| 84 | +| Functions | `plan_extract`, `plan_substring`, `plan_overlay`, `plan_position`, `plan_make_map` | |
| 85 | +| Identifiers | `plan_field_access`, `plan_compound_identifier` | |
| 86 | +| Aggregates/Windows | `plan_aggregate`, `plan_window` | |
| 87 | + |
| 88 | +See the [ExprPlanner API documentation] for full method signatures. |
| 89 | + |
| 90 | +#### Example: Custom Arrow Operator |
| 91 | + |
| 92 | +This example maps the `->` operator to string concatenation: |
| 93 | + |
| 94 | +```rust |
| 95 | +# use std::sync::Arc; |
| 96 | +# use datafusion::common::DFSchema; |
| 97 | +# use datafusion::error::Result; |
| 98 | +# use datafusion::logical_expr::Operator; |
| 99 | +# use datafusion::prelude::*; |
| 100 | +# use datafusion::sql::sqlparser::ast::BinaryOperator; |
| 101 | +use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr}; |
| 102 | +# use datafusion_expr::BinaryExpr; |
| 103 | + |
| 104 | +#[derive(Debug)] |
| 105 | +struct MyCustomPlanner; |
| 106 | + |
| 107 | +impl ExprPlanner for MyCustomPlanner { |
| 108 | + fn plan_binary_op( |
| 109 | + &self, |
| 110 | + expr: RawBinaryExpr, |
| 111 | + _schema: &DFSchema, |
| 112 | + ) -> Result<PlannerResult<RawBinaryExpr>> { |
| 113 | + match &expr.op { |
| 114 | + // Map `->` to string concatenation |
| 115 | + BinaryOperator::Arrow => { |
| 116 | + Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr { |
| 117 | + left: Box::new(expr.left.clone()), |
| 118 | + right: Box::new(expr.right.clone()), |
| 119 | + op: Operator::StringConcat, |
| 120 | + }))) |
| 121 | + } |
| 122 | + _ => Ok(PlannerResult::Original(expr)), |
| 123 | + } |
| 124 | + } |
| 125 | +} |
| 126 | + |
| 127 | +#[tokio::main] |
| 128 | +async fn main() -> Result<()> { |
| 129 | + // Use postgres dialect to enable `->` operator parsing |
| 130 | + let config = SessionConfig::new() |
| 131 | + .set_str("datafusion.sql_parser.dialect", "postgres"); |
| 132 | + let mut ctx = SessionContext::new_with_config(config); |
| 133 | + |
| 134 | + // Register the custom planner |
| 135 | + ctx.register_expr_planner(Arc::new(MyCustomPlanner))?; |
| 136 | + |
| 137 | + // Now `->` works as string concatenation |
| 138 | + let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?; |
| 139 | + // Returns: "helloworld" |
| 140 | + Ok(()) |
| 141 | +} |
| 142 | +``` |
| 143 | + |
| 144 | +For more details, see the [ExprPlanner API documentation] and the |
| 145 | +[expr_planner test examples]. |
| 146 | + |
| 147 | +### TypePlanner: Custom Data Types |
| 148 | + |
| 149 | +Use [`TypePlanner`] to map SQL data types to Arrow/DataFusion types. This is useful |
| 150 | +when you need to support SQL types that aren't natively recognized. |
| 151 | + |
| 152 | +#### Example: Custom DATETIME Type |
| 153 | + |
| 154 | +```rust |
| 155 | +# use std::sync::Arc; |
| 156 | +# use arrow::datatypes::{DataType, TimeUnit}; |
| 157 | +# use datafusion::error::Result; |
| 158 | +# use datafusion::prelude::*; |
| 159 | +# use datafusion::execution::SessionStateBuilder; |
| 160 | +use datafusion_expr::planner::TypePlanner; |
| 161 | +# use sqlparser::ast; |
| 162 | + |
| 163 | +#[derive(Debug)] |
| 164 | +struct MyTypePlanner; |
| 165 | + |
| 166 | +impl TypePlanner for MyTypePlanner { |
| 167 | + fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> { |
| 168 | + match sql_type { |
| 169 | + // Map DATETIME(precision) to Arrow Timestamp |
| 170 | + ast::DataType::Datetime(precision) => { |
| 171 | + let time_unit = match precision { |
| 172 | + Some(0) => TimeUnit::Second, |
| 173 | + Some(3) => TimeUnit::Millisecond, |
| 174 | + Some(6) => TimeUnit::Microsecond, |
| 175 | + None | Some(9) => TimeUnit::Nanosecond, |
| 176 | + _ => return Ok(None), // Let default handling take over |
| 177 | + }; |
| 178 | + Ok(Some(DataType::Timestamp(time_unit, None))) |
| 179 | + } |
| 180 | + _ => Ok(None), // Return None for types we don't handle |
| 181 | + } |
| 182 | + } |
| 183 | +} |
| 184 | + |
| 185 | +#[tokio::main] |
| 186 | +async fn main() -> Result<()> { |
| 187 | + let state = SessionStateBuilder::new() |
| 188 | + .with_default_features() |
| 189 | + .with_type_planner(Arc::new(MyTypePlanner)) |
| 190 | + .build(); |
| 191 | + |
| 192 | + let ctx = SessionContext::new_with_state(state); |
| 193 | + |
| 194 | + // Now DATETIME type is recognized |
| 195 | + ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?; |
| 196 | + Ok(()) |
| 197 | +} |
| 198 | +``` |
| 199 | + |
| 200 | +For more details, see the [TypePlanner API documentation]. |
| 201 | + |
| 202 | +### RelationPlanner: Custom FROM Clause Elements |
| 203 | + |
| 204 | +Use [`RelationPlanner`] to handle custom relations in the FROM clause. This |
| 205 | +enables you to implement SQL constructs like: |
| 206 | + |
| 207 | +- `TABLESAMPLE` for sampling data |
| 208 | +- `PIVOT` / `UNPIVOT` for data reshaping |
| 209 | +- `MATCH_RECOGNIZE` for pattern matching |
| 210 | +- Any custom relation syntax parsed by sqlparser |
| 211 | + |
| 212 | +#### The RelationPlannerContext |
| 213 | + |
| 214 | +When implementing [`RelationPlanner`], you receive a [`RelationPlannerContext`] that |
| 215 | +provides utilities for planning: |
| 216 | + |
| 217 | +| Method | Purpose | |
| 218 | +| --------------------------- | ----------------------------------------------- | |
| 219 | +| `plan(relation)` | Recursively plan a nested relation | |
| 220 | +| `sql_to_expr(expr, schema)` | Convert SQL expression to DataFusion Expr | |
| 221 | +| `context_provider()` | Access session configuration, tables, functions | |
| 222 | + |
| 223 | +See the [RelationPlanner API documentation] for additional methods like |
| 224 | +`normalize_ident()` and `object_name_to_table_reference()`. |
| 225 | + |
| 226 | +#### Implementation Strategies |
| 227 | + |
| 228 | +There are two main approaches when implementing a [`RelationPlanner`]: |
| 229 | + |
| 230 | +1. **Rewrite to Standard SQL**: Transform custom syntax into equivalent standard |
| 231 | + operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY |
| 232 | + with CASE expressions). This is the simplest approach when possible. |
| 233 | + |
| 234 | +2. **Custom Logical and Physical Nodes**: Create a [`UserDefinedLogicalNode`] to |
| 235 | + represent the operation in the logical plan, along with a custom [`ExecutionPlan`] |
| 236 | + to execute it. Both are required for end-to-end execution. |
| 237 | + |
| 238 | +#### Example: Basic RelationPlanner Structure |
| 239 | + |
| 240 | +```rust |
| 241 | +# use std::sync::Arc; |
| 242 | +# use datafusion::error::Result; |
| 243 | +# use datafusion::prelude::*; |
| 244 | +use datafusion_expr::planner::{ |
| 245 | + PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning, |
| 246 | +}; |
| 247 | +use datafusion_sql::sqlparser::ast::TableFactor; |
| 248 | + |
| 249 | +#[derive(Debug)] |
| 250 | +struct MyRelationPlanner; |
| 251 | + |
| 252 | +impl RelationPlanner for MyRelationPlanner { |
| 253 | + fn plan_relation( |
| 254 | + &self, |
| 255 | + relation: TableFactor, |
| 256 | + ctx: &mut dyn RelationPlannerContext, |
| 257 | + ) -> Result<RelationPlanning> { |
| 258 | + match relation { |
| 259 | + // Handle your custom relation |
| 260 | + TableFactor::Pivot { table, alias, .. } => { |
| 261 | + // Plan the input table |
| 262 | + let input = ctx.plan(*table)?; |
| 263 | + |
| 264 | + // Transform or wrap the plan as needed |
| 265 | + // ... |
| 266 | + |
| 267 | + Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias))) |
| 268 | + } |
| 269 | + |
| 270 | + // Return Original for relations you don't handle |
| 271 | + other => Ok(RelationPlanning::Original(other)), |
| 272 | + } |
| 273 | + } |
| 274 | +} |
| 275 | + |
| 276 | +#[tokio::main] |
| 277 | +async fn main() -> Result<()> { |
| 278 | + let ctx = SessionContext::new(); |
| 279 | + |
| 280 | + // Register the custom planner |
| 281 | + ctx.register_relation_planner(Arc::new(MyRelationPlanner))?; |
| 282 | + |
| 283 | + Ok(()) |
| 284 | +} |
| 285 | +``` |
| 286 | + |
| 287 | +## Complete Examples |
| 288 | + |
| 289 | +The DataFusion repository includes comprehensive examples demonstrating each |
| 290 | +approach: |
| 291 | + |
| 292 | +### TABLESAMPLE (Custom Logical and Physical Nodes) |
| 293 | + |
| 294 | +The [table_sample.rs] example shows a complete end-to-end implementation of how to |
| 295 | +support queries such as: |
| 296 | + |
| 297 | +```sql |
| 298 | +SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42) |
| 299 | +``` |
| 300 | + |
| 301 | +### PIVOT/UNPIVOT (Rewrite Strategy) |
| 302 | + |
| 303 | +The [pivot_unpivot.rs] example demonstrates rewriting custom syntax to standard SQL |
| 304 | +for queries such as: |
| 305 | + |
| 306 | +```sql |
| 307 | +SELECT * FROM sales |
| 308 | + PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4')) |
| 309 | +``` |
| 310 | + |
| 311 | +## Recap |
| 312 | + |
| 313 | +1. Use [`ExprPlanner`] for custom operators and expression handling |
| 314 | +2. Use [`TypePlanner` for custom SQL data types |
| 315 | +3. Use [`RelationPlanner`] for custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.) |
| 316 | +4. Register planners via [`SessionContext`] or [`SessionStateBuilder`] |
| 317 | + |
| 318 | +## See Also |
| 319 | + |
| 320 | +- API Documentation: [`ExprPlanner`], [`TypePlanner`], [`RelationPlanner`] |
| 321 | +- [relation_planner examples] - Complete TABLESAMPLE, PIVOT/UNPIVOT implementations |
| 322 | +- [expr_planner test examples] - Custom operator examples |
| 323 | +- [Custom Expression Planning](functions/adding-udfs.md#custom-expression-planning) in the UDF guide |
| 324 | + |
| 325 | +[`exprplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html |
| 326 | +[`typeplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.TypePlanner.html |
| 327 | +[`relationplanner`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.RelationPlanner.html |
| 328 | +[`userdefinedlogicalnode`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.UserDefinedLogicalNode.html |
| 329 | +[`executionplan`]: https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html |
| 330 | +[`sessioncontext`]: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html |
| 331 | +[`sessionstatebuilder`]: https://docs.rs/datafusion/latest/datafusion/execution/session_state/struct.SessionStateBuilder.html |
| 332 | +[`relationplannercontext`]: https://docs.rs/datafusion/latest/datafusion/sql/planner/trait.RelationPlannerContext.html |
| 333 | +[exprplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html |
| 334 | +[typeplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.TypePlanner.html |
| 335 | +[relationplanner api documentation]: https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.RelationPlanner.html |
| 336 | +[expr_planner test examples]: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/expr_planner.rs |
| 337 | +[relation_planner examples]: https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/relation_planner |
| 338 | +[table_sample.rs]: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/relation_planner/table_sample.rs |
| 339 | +[pivot_unpivot.rs]: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/relation_planner/pivot_unpivot.rs |
0 commit comments