Skip to content

Commit a405d3f

Browse files
adriangbclaude
andauthored
Support nested field access in get_field with multiple path arguments (#19389)
## Summary This PR extends `get_field` to accept multiple field name arguments for nested struct/map access, enabling `get_field(col, 'a', 'b', 'c')` as equivalent to `col['a']['b']['c']`. **The primary motivation is to make it easier for downstream optimizations to match on and optimize struct/map field access patterns.** By representing `col['a']['b']['c']` as a single `get_field(col, 'a', 'b', 'c')` call rather than nested `get_field(get_field(get_field(col, 'a'), 'b'), 'c')` calls, optimization rules can more easily identify and transform field access patterns. This is related / maybe prep work for #19387 but I think is a good improvement in its own right. ## Changes - **Variadic signature**: `get_field` now accepts 2+ arguments (base + one or more field names) - **Type validation at planning time**: Accessing a field on a non-struct/map type (e.g., `get_field({a: 1}, 'a', 'b')`) fails during planning with a clear error message indicating which argument position caused the failure - **Bracket syntax optimization**: The `FieldAccessPlanner` now merges consecutive bracket accesses into a single `get_field` call (e.g., `s['a']['b']` → `get_field(s, 'a', 'b')`) - **Mixed access handling**: Array index access correctly breaks the batching (e.g., `s['a'][0]['b']` → `get_field(array_element(get_field(s, 'a'), 0), 'b')`) ## Example ```sql -- Direct function call with nested access SELECT get_field(my_struct, 'outer', 'inner', 'value'); -- Equivalent bracket syntax (now optimized to single get_field) SELECT my_struct['outer']['inner']['value']; -- EXPLAIN shows single get_field call EXPLAIN SELECT s['a']['b'] FROM t; -- Projection: get_field(t.s, Utf8("a"), Utf8("b")) ``` ## Backwards Compatibility - The original 2-argument form `get_field(struct, 'field')` continues to work unchanged - Existing queries using bracket syntax will automatically benefit from the optimization ## Test plan - [x] Backwards compatibility test for 2-argument form - [x] Multi-level get_field with 2, 3, and 5 levels of nesting - [x] Type validation error tests at argument positions 2, 3, 4 - [x] Non-existent field error tests - [x] Null handling (null at base, null in middle of chain) - [x] Mixed array/struct access (verifies array index breaks batching) - [x] Nullable parent propagation - [x] EXPLAIN test verifying single get_field call for bracket syntax - [x] Minimum argument validation (0 and 1 argument cases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <[email protected]>
1 parent 47ddd50 commit a405d3f

File tree

6 files changed

+665
-264
lines changed

6 files changed

+665
-264
lines changed

datafusion/functions-nested/src/planner.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ impl ExprPlanner for FieldAccessPlanner {
148148

149149
match field_access {
150150
// expr["field"] => get_field(expr, "field")
151+
// Nested accesses like expr["a"]["b"] create nested get_field calls,
152+
// which are then merged by the SimplifyExpressions optimizer pass via
153+
// the GetFieldFunc::simplify() method.
151154
GetFieldAccess::NamedStructField { name } => {
152155
Ok(PlannerResult::Planned(get_field(expr, name)))
153156
}

0 commit comments

Comments
 (0)