Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit ad44ab5

Browse files
gatorsmilecloud-fan
authored andcommitted
[SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct
### What changes were proposed in this pull request? ```SQL CREATE TABLE `tab1` (`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>) USING parquet INSERT INTO `tab1` SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b')) SELECT custom_fields.id, custom_fields.value FROM tab1 ``` The above query always return the last struct of the array, because the rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we always use the same `GenericInternalRow` object when doing the cast. ### How was this patch tested? Author: gatorsmile <[email protected]> Closes apache#18412 from gatorsmile/castStruct. (cherry picked from commit 2e1586f) Signed-off-by: Wenchen Fan <[email protected]>
1 parent 96c04f1 commit ad44ab5

File tree

2 files changed

+23
-2
lines changed
  • sql
    • catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
    • core/src/test/scala/org/apache/spark/sql/sources

2 files changed

+23
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -482,15 +482,15 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
482482
case (fromField, toField) => cast(fromField.dataType, toField.dataType)
483483
}
484484
// TODO: Could be faster?
485-
val newRow = new GenericInternalRow(from.fields.length)
486485
buildCast[InternalRow](_, row => {
486+
val newRow = new GenericInternalRow(from.fields.length)
487487
var i = 0
488488
while (i < row.numFields) {
489489
newRow.update(i,
490490
if (row.isNullAt(i)) null else castFuncs(i)(row.get(i, from.apply(i).dataType)))
491491
i += 1
492492
}
493-
newRow.copy()
493+
newRow
494494
})
495495
}
496496

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,4 +345,25 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
345345
)
346346
}
347347
}
348+
349+
test("SPARK-21203 wrong results of insertion of Array of Struct") {
350+
val tabName = "tab1"
351+
withTable(tabName) {
352+
spark.sql(
353+
"""
354+
|CREATE TABLE `tab1`
355+
|(`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
356+
|USING parquet
357+
""".stripMargin)
358+
spark.sql(
359+
"""
360+
|INSERT INTO `tab1`
361+
|SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b'))
362+
""".stripMargin)
363+
364+
checkAnswer(
365+
spark.sql("SELECT custom_fields.id, custom_fields.value FROM tab1"),
366+
Row(Array(1, 2), Array("a", "b")))
367+
}
368+
}
348369
}

0 commit comments

Comments
 (0)