feat: Add support for `explode` and `explode_outer` for array inputs #2836

andygrove · 2025-12-02T16:08:18Z

Which issue does this PR close?

Closes #1927

Rationale for this change

explode is widely used in Spark jobs that work with arrays.

What changes are included in this PR?

Add support for explode for arrays, but not maps yet. I filed #2837 for adding support for maps.

High level changes:

New Explode operator in protobuf
CometExplodeExec contains serde code for JVM
Native code delegates to UnnestExec
Tests are in new CometGenerateExecSuite

How are these changes tested?

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-87-generic
AMD Ryzen 9 7950X3D 16-Core Processor
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
explode                                              65             68           2          3.2         316.9       1.0X
explode: Comet                                       48             52           2          4.2         236.6       1.3X

mbutrovich · 2025-12-02T16:53:34Z

Awesome stuff @andygrove! DataFusion has unnest for arrays and maps, but I don't know if the semantics perfectly match Spark explode behavior https://datafusion.apache.org/user-guide/sql/special_functions.html

andygrove · 2025-12-02T17:32:52Z

Awesome stuff @andygrove! DataFusion has unnest for arrays and maps, but I don't know if the semantics perfectly match Spark explode behavior https://datafusion.apache.org/user-guide/sql/special_functions.html

Thanks! I am trying this out now.

codecov-commenter · 2025-12-02T17:32:56Z

Codecov Report

❌ Patch coverage is 68.05556% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.19%. Comparing base (f09f8af) to head (a65a432).
⚠️ Report is 736 commits behind head on main.

Files with missing lines	Patch %	Lines
...n/scala/org/apache/spark/sql/comet/operators.scala	66.66%	10 Missing and 13 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2836      +/-   ##
============================================
+ Coverage     56.12%   59.19%   +3.06%     
- Complexity      976     1473     +497     
============================================
  Files           119      167      +48     
  Lines         11743    15307    +3564     
  Branches       2251     2530     +279     
============================================
+ Hits           6591     9061    +2470     
- Misses         4012     4952     +940     
- Partials       1140     1294     +154

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2025-12-02T18:50:23Z

Awesome stuff @andygrove! DataFusion has unnest for arrays and maps, but I don't know if the semantics perfectly match Spark explode behavior https://datafusion.apache.org/user-guide/sql/special_functions.html

Thanks! I am trying this out now.

There does appear to be a different in the handling of empty arrays in in the explode_outer case, but we can fall back to Spark for now for that case until we can get a fix into DataFusion core.

andygrove · 2025-12-02T20:53:29Z

Awesome stuff @andygrove! DataFusion has unnest for arrays and maps, but I don't know if the semantics perfectly match Spark explode behavior https://datafusion.apache.org/user-guide/sql/special_functions.html

Thanks! I am trying this out now.

There does appear to be a different in the handling of empty arrays in in the explode_outer case, but we can fall back to Spark for now for that case until we can get a fix into DataFusion core.

I filed an issue in DataFusion: apache/datafusion#19053

comphead

Thanks @andygrove it mostly looks good

comphead · 2025-12-02T23:20:36Z

spark/src/test/resources/tpcds-micro-benchmarks/explode.sql

@@ -0,0 +1,4 @@
+SELECT i_item_sk, explode(array(i_brand_id, i_class_id, i_category_id, i_manufact_id, i_manager_id))


should we also have explode_outer?

comphead · 2025-12-02T23:53:12Z

native/core/src/execution/planner.rs

+                    ));
+                };
+
+                // Create projection expressions for other columns


other columns? 🤔

yes, as in SELECT a, b, c, explode(d) FROM ...

comphead · 2025-12-02T23:54:37Z

native/core/src/execution/planner.rs

+                    .collect();
+
+                // Add the array column as the last column
+                let array_col_name = format!("col_{}", projections.len());


can this name cause a conflict or issue if original dataset has col_* cols?

I removed this now and preserve original names

comphead · 2025-12-02T23:55:27Z

native/core/src/execution/planner.rs

+                output_fields.push(Field::new(
+                    array_field.name(),
+                    element_type,
+                    true, // Element is nullable after unnesting


andygrove · 2025-12-04T18:56:51Z

One Spark SQL test needs rewriting to work with Comet. I am working on it.

comphead

Thanks @andygrove just one nit on: if we need to have a microbenchmark on explode_outer but it can be done in follow up PR

andygrove · 2025-12-05T18:12:50Z

Thanks @andygrove just one nit on: if we need to have a microbenchmark on explode_outer but it can be done in follow up PR

Thanks. I added this to the scope of #2838.

bjornjorgensen · 2025-12-05T21:24:43Z

The supported list is not updated

datafusion-comet/doc/spark_builtin_expr_coverage.txt

Line 141 in f08fcad

    
           |explode                    |{FAILED, [{SELECT explode(array(10, 20));, Unsupported}]}                                                                                                                                                            |

comphead · 2025-12-05T22:14:44Z

The supported list is not updated

datafusion-comet/doc/spark_builtin_expr_coverage.txt

Line 141 in f08fcad

|explode |{FAILED, [{SELECT explode(array(10, 20));, Unsupported}]} |

Thanks @bjornjorgensen this is autogenerated doc thats being triggered manually, I'm actually thinking we can deprecate it

bjornjorgensen · 2025-12-05T22:21:18Z

ohh.. ok. I just have a look at this project and read "You may have a specific expression in mind that you’d like to add, but if not, you can review the expression coverage document to see which expressions are not yet supported." https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html

comphead · 2025-12-05T22:57:36Z

ohh.. ok. I just have a look at this project and read "You may have a specific expression in mind that you’d like to add, but if not, you can review the expression coverage document to see which expressions are not yet supported." https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html

Thanks for reporting this, the correct doc would be https://github.com/apache/datafusion-comet/blob/main/docs/spark_expressions_support.md

I updated the manual #2854

andygrove added 4 commits December 2, 2025 08:42

initial implementation

b50829b

add tests and fix bugs

21e9bf2

clippy

0ea6dce

fix failing test

36cb3bc

andygrove changed the title ~~feat: Add support for explode and explode_outer [WIP]~~ feat: Add support for explode and explode_outer Dec 2, 2025

andygrove changed the title ~~feat: Add support for explode and explode_outer~~ feat: Add support for explode and explode_outer for array inputs Dec 2, 2025

andygrove added 2 commits December 2, 2025 09:44

improve fallback rules

de1f012

add fallback test for map input

0aa8048

andygrove mentioned this pull request Dec 2, 2025

Add support for map inputs to explode #2837

Open

Revert a change

f5fc1dc

andygrove changed the title ~~feat: Add support for explode and explode_outer for array inputs~~ feat: Add support for explode and explode_outer for array inputs [WIP] Dec 2, 2025

andygrove added 2 commits December 2, 2025 10:09

more tests

6105dc1

fix null support

a3cfffe

switch to UnnestExec

28a1752

andygrove added 3 commits December 2, 2025 11:56

fall back for explode_outer

c1a564c

clippy

234c231

metrics

a595a18

add benchmark

d0b90d7

andygrove mentioned this pull request Dec 2, 2025

Add support for explode_outer #2838

Open

link to issue

3b00224

andygrove changed the title ~~feat: Add support for explode and explode_outer for array inputs [WIP]~~ feat: Add support for explode and explode_outer for array inputs Dec 2, 2025

andygrove marked this pull request as ready for review December 2, 2025 21:32

andygrove added 3 commits December 2, 2025 14:34

remove outdated comment

876e7a1

Revert

f9dc3e5

Revert

96336ab

andygrove requested review from comphead, mbutrovich and parthchandra December 2, 2025 21:57

comphead reviewed Dec 2, 2025

View reviewed changes

andygrove added 2 commits December 3, 2025 16:32

stop renaming columns

311e044

update 3.5.7 diff

28b7e37

andygrove added 4 commits December 4, 2025 11:58

fix test for 3.5.7

0929551

update 3.4.3 diff

c7d2385

update 4.0.1

0eb2016

update 3.4.3 diff

a65a432

comphead approved these changes Dec 5, 2025

View reviewed changes

andygrove merged commit 0bda9d2 into apache:main Dec 5, 2025
115 checks passed

andygrove deleted the explode branch December 5, 2025 18:13

		@@ -0,0 +1,4 @@
		SELECT i_item_sk, explode(array(i_brand_id, i_class_id, i_category_id, i_manufact_id, i_manager_id))

feat: Add support for explode and explode_outer for array inputs #2836

feat: Add support for explode and explode_outer for array inputs #2836

Uh oh!

Conversation

andygrove commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

mbutrovich commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Dec 2, 2025

Uh oh!

codecov-commenter commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Dec 2, 2025

Uh oh!

andygrove commented Dec 2, 2025

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

comphead Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

comphead Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

comphead Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

comphead Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove commented Dec 4, 2025

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Dec 5, 2025

Uh oh!

Uh oh!

bjornjorgensen commented Dec 5, 2025

Uh oh!

comphead commented Dec 5, 2025

Uh oh!

bjornjorgensen commented Dec 5, 2025

Uh oh!

comphead commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Add support for `explode` and `explode_outer` for array inputs #2836

feat: Add support for `explode` and `explode_outer` for array inputs #2836

andygrove commented Dec 2, 2025 •

edited

Loading

mbutrovich commented Dec 2, 2025 •

edited

Loading

codecov-commenter commented Dec 2, 2025 •

edited

Loading