-
Notifications
You must be signed in to change notification settings - Fork 268
feat: Add support for explode and explode_outer for array inputs
#2836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
explode and explode_outer [WIP]explode and explode_outer
explode and explode_outerexplode and explode_outer for array inputs
|
Awesome stuff @andygrove! DataFusion has |
explode and explode_outer for array inputsexplode and explode_outer for array inputs [WIP]
Thanks! I am trying this out now. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2836 +/- ##
============================================
+ Coverage 56.12% 59.19% +3.06%
- Complexity 976 1473 +497
============================================
Files 119 167 +48
Lines 11743 15307 +3564
Branches 2251 2530 +279
============================================
+ Hits 6591 9061 +2470
- Misses 4012 4952 +940
- Partials 1140 1294 +154 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There does appear to be a different in the handling of empty arrays in in the |
I filed an issue in DataFusion: apache/datafusion#19053 |
explode and explode_outer for array inputs [WIP]explode and explode_outer for array inputs
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove it mostly looks good
| @@ -0,0 +1,4 @@ | |||
| SELECT i_item_sk, explode(array(i_brand_id, i_class_id, i_category_id, i_manufact_id, i_manager_id)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also have explode_outer?
| )); | ||
| }; | ||
|
|
||
| // Create projection expressions for other columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other columns? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, as in SELECT a, b, c, explode(d) FROM ...
native/core/src/execution/planner.rs
Outdated
| .collect(); | ||
|
|
||
| // Add the array column as the last column | ||
| let array_col_name = format!("col_{}", projections.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this name cause a conflict or issue if original dataset has col_* cols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this now and preserve original names
| output_fields.push(Field::new( | ||
| array_field.name(), | ||
| element_type, | ||
| true, // Element is nullable after unnesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
One Spark SQL test needs rewriting to work with Comet. I am working on it. |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove just one nit on: if we need to have a microbenchmark on explode_outer but it can be done in follow up PR
Thanks. I added this to the scope of #2838. |
|
The supported list is not updated
|
Thanks @bjornjorgensen this is autogenerated doc thats being triggered manually, I'm actually thinking we can deprecate it |
|
ohh.. ok. I just have a look at this project and read "You may have a specific expression in mind that you’d like to add, but if not, you can review the expression coverage document to see which expressions are not yet supported." https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html |
Thanks for reporting this, the correct doc would be https://github.com/apache/datafusion-comet/blob/main/docs/spark_expressions_support.md I updated the manual #2854 |
Which issue does this PR close?
Closes #1927
Rationale for this change
explodeis widely used in Spark jobs that work with arrays.What changes are included in this PR?
Add support for
explodefor arrays, but not maps yet. I filed #2837 for adding support for maps.High level changes:
Explodeoperator in protobufCometExplodeExeccontains serde code for JVMUnnestExecCometGenerateExecSuiteHow are these changes tested?