feat(spark): support Struct type and literal, and include names for struct fields by Blizzara · Pull Request #342 · substrait-io/substrait-java

Blizzara · 2025-03-12T16:16:19Z

BREAKING CHANGE: Plan root's "names" field now includes nested names in addition to the top field names. The new behavior is how Substrait spec defines the names, but this change breaks compatibility of existing plans containing struct fields since the names list will not match.

Blizzara · 2025-03-12T16:30:32Z

spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala

+          ImmutableRoot
+            .builder()
+            .input(rel)
+            .addAllNames(ToSubstraitType.toNamedStruct(p.schema).names())


This changes the produced plans to include DFS names in the root

Blizzara · 2025-03-12T16:31:03Z

spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala


  private def buildNamedScan(schema: StructType, tableNames: List[String]): relation.NamedScan = {
-    val namedStruct = toNamedStruct(schema)
+    val namedStruct = ToSubstraitType.toNamedStruct(schema)


Just being more explicit here

Blizzara · 2025-03-12T16:33:31Z

spark/src/main/scala/io/substrait/spark/ToSubstraitType.scala

-    NamedStruct.of(names, struct)
-  }

-  def toStructType(namedStruct: NamedStruct): StructType = {


these were previously inside ToSubstraitType, which is confusing given they return Spark types. Now they're moved into the ToSparkType object above, but also rewritten to use the newly-available struct type code.

(We should also split this file, but I'd rather do that as FLUP to keep the diff clean)

Blizzara · 2025-03-12T16:46:52Z

spark/src/main/scala/io/substrait/spark/ToSubstraitType.scala

+    typeExpression.accept(new ToSparkType(Seq.empty))
+  }
+
+  def toStructType(namedStruct: NamedStruct): StructType = {


moved from below, see comment below

Blizzara · 2025-03-12T16:48:12Z

spark/src/main/scala/io/substrait/spark/ToSubstraitType.scala

-        .map { case ((t, d), name) => StructField(name, d, t.nullable()) }
-    )
+  def toNamedStruct(schema: StructType): NamedStruct = {
+    val dfsNames = JavaConverters.seqAsJavaList(fieldNamesDfs(schema))


this changes anything using toNamedStruct to produce the full list of names rather than only top-level

Blizzara · 2025-03-12T16:53:28Z

spark/src/main/scala/io/substrait/utils/Util.scala

    def seqToOptionHelper(s: Seq[Option[T]], accum: Seq[T] = Seq[T]()): Option[Seq[T]] = {
      s match {
-        case Some(head) :: Nil =>
+        case Seq(Some(head)) =>


I'm not sure why but the earlier code just wasn't working

Blizzara · 2025-03-12T16:56:07Z

@andrew-coleman @vbarua looking for your review for this :) Note that this is a breaking change since the currently produced plans aren't valid Substrait - they lack the inner names, this adds those, but that changes the meaning of the names-fields.

For Andrew: I saw you had added a bit of struct handling in your PR, this is compatible with that but adds the rest of the handling as well.

andrew-coleman · 2025-03-13T11:37:10Z

spark/src/main/scala/io/substrait/spark/ToSubstraitType.scala

+              require(nameIdx < dfsNames.size)
+              val n = dfsNames(nameIdx)
+              nameIdx += 1
+              n


Suggested change

require(nameIdx < dfsNames.size)

val n = dfsNames(nameIdx)

nameIdx += 1

n

dfsNames(i)

Won't nameIdx always be the same i? If so, no need to increment a separate counter.
Incrementing an object-scoped counter assumes that the .map is behaving like a foreach - i.e. guaranteed order of execution, but I think (in theory, at least) the map could execute its functions concurrently (not sure it does, in practice), just assembling the results back into the original sequence.

Looking at this further, I guess it's doing this because it's doing a depth-first traversal of a potentially nested struct.

Yup, exactly

spark/src/test/scala/io/substrait/spark/SubstraitPlanTestBase.scala

vbarua

Made a brief skim, and overall looks reasonable to me.

Thanks for looking at this as well @andrew-coleman, I think you're more familiar with the Spark side as well.

feat: add support for Struct types and literals

0cfa99f

Blizzara changed the title ~~feat(spark: support Struct type and literal, and include names for struct fields~~ feat(spark): support Struct type and literal, and include names for struct fields Mar 12, 2025

Blizzara commented Mar 12, 2025

View reviewed changes

feat: add testing for schemas

84c29a4

Blizzara force-pushed the avo/support-structs-and-names branch 2 times, most recently from 96b721c to eec9d21 Compare March 12, 2025 16:46

Blizzara commented Mar 12, 2025

View reviewed changes

feat: handle "names" correctly as a DFS list of all names of the plan

36d3b20

Blizzara commented Mar 12, 2025

View reviewed changes

Blizzara force-pushed the avo/support-structs-and-names branch from eec9d21 to 36d3b20 Compare March 12, 2025 16:53

Blizzara marked this pull request as ready for review March 12, 2025 16:53

vbarua self-requested a review March 12, 2025 16:58

andrew-coleman reviewed Mar 13, 2025

View reviewed changes

andrew-coleman approved these changes Mar 14, 2025

View reviewed changes

Merge branch 'main' into avo/support-structs-and-names

4727c07

Blizzara commented Mar 14, 2025

View reviewed changes

spark/src/test/scala/io/substrait/spark/SubstraitPlanTestBase.scala Outdated Show resolved Hide resolved

Arttu Voutilainen added 2 commits March 14, 2025 15:56

fix: merge and tests

075da14

fix: spotless

c169391

Blizzara mentioned this pull request Mar 26, 2025

fix(spark): enable aliased expressions to round-trip #348

Merged

Arttu Voutilainen added 3 commits March 27, 2025 10:32

Merge branch 'main' into avo/support-structs-and-names

7648ddf

Merge branch 'main' into avo/support-structs-and-names

fd1d345

fix: fix merged test

8d4b177

vbarua approved these changes Mar 28, 2025

View reviewed changes

vbarua merged commit f27004a into substrait-io:main Mar 28, 2025
13 checks passed

Blizzara deleted the avo/support-structs-and-names branch April 2, 2025 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spark): support Struct type and literal, and include names for struct fields#342

feat(spark): support Struct type and literal, and include names for struct fields#342
vbarua merged 9 commits intosubstrait-io:mainfrom
Blizzara:avo/support-structs-and-names

Blizzara commented Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara Mar 12, 2025

Uh oh!

Blizzara commented Mar 12, 2025

Uh oh!

andrew-coleman Mar 13, 2025

Uh oh!

andrew-coleman Mar 13, 2025

Uh oh!

Blizzara Mar 14, 2025

Uh oh!

Uh oh!

vbarua left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Blizzara commented Mar 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blizzara commented Mar 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vbarua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants